Releases: icaropires/pdf2dataset
Releases · icaropires/pdf2dataset
v0.5.3
v0.5.2
Changes
@features
decorator more flexible, don't requiringpyarrow_type
for helper features #19@features
decorator now supports receiving arguments to be passed topyarrow_type
#20- number of total tasks when resuming processing is now fixed #18
- Don't use any kind of pools when using
num_cpus=1
Known issues
- Problem with memoization when implementing custom tasks class #21
v0.5.1
Rework base structure
Just renamed from v4.0.0 which was wrong!
Changes
- Refactor most of the code (including tests) structure to be scalable on the number of extracted features
- Add support to specify custom features through inheritance
- Add image feature
- Support multiple params to customize text and image extractions (image size, ocr image size, image format, etc)
- Update dependencies
- General small fixes
Known issues
- Saving progress had to be disabled yet for this release #8 . Will be fixed on the next one
v0.5.0
Changes
- Rework "resuming progress" feature
- Add support to receiving a list of files to be processed
- Improves code quality
Fix performance
Changes
- v0.4.0 caused some problems with performance, this fix them
Small improvements
Changes
- Raise exception for invalid
input_dir
- Add maximum chunksize default constraint
Bug Fix
Changes
- Raise exceptions for invalid page numbers when specifying tasks
New features
Changes
- Add ability to pass specific tasks to be calculated
- Add ability to return a list instead of pandas DataFrame
Fix High Memory Usage
Changes
- Fix high memory usage caused by last release #3
- Chunksize was not really being calculated