Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

VikParuchuri / pdftext Public

Notifications You must be signed in to change notification settings
Fork 40
Star 405

Code
Issues 2
Pull requests 2
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: VikParuchuri/pdftext

Releases Tags

Releases · VikParuchuri/pdftext

Ignore special chars, break lines more aggressively

17 Oct 18:51

VikParuchuri

v0.3.14

7460bf4

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

Ignore special chars, break lines more aggressively

Merge pull request #12 from VikParuchuri/dev

Improve line breaks, ignore special chars

Assets 2

All reactions

Fix flattening bug

08 Oct 16:07

VikParuchuri

v0.3.13

5915750

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

Fix flattening bug

Merge pull request #11 from VikParuchuri/dev

Fix bug with flattening

Assets 2

All reactions

Fix document loading bug

08 Oct 13:14

VikParuchuri

v0.3.12

56af2c1

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

Fix document loading bug

There was a bug where pdf paths were assumed to be strings - this is not always the case

Assets 2

All reactions

ONNX model, option to flatten form fields

08 Oct 02:36

VikParuchuri

v0.3.11

c4f0d34

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

ONNX model, option to flatten form fields

Faster inference with ONNX
Remove warning when loading scikit-learn model
Flatten form fields into pdf

Assets 2

All reactions

Fix bbox bug

27 May 22:59

VikParuchuri

v0.3.10

2557089

Compare

Choose a tag to compare

View all tags

Fix bbox bug

Fixed bug that didn't unnormalize bboxes properly.

Assets 2

All reactions

Minor performance optimizations

24 May 18:00

VikParuchuri

v0.3.9

51266d8

Compare

Choose a tag to compare

View all tags

Minor performance optimizations

Optimize dictionary access and loops to get an ~10% speedup

Assets 2

All reactions

Add optional parallel workers

23 May 19:30

VikParuchuri

v0.3.8

37d1caf

Compare

Choose a tag to compare

View all tags

Add optional parallel workers

Enable optional parallel workers when extracting text. This can cause a performance hit on small pdfs, but can speed things up 2x or more on larger ones. This can be done with the --workers flag via CLI, or via the workers kwarg.

Assets 2