Skip to content

Releases: pdfminer/pdfminer.six

20240706

06 Jul 13:48
Compare
Choose a tag to compare

Added

  • Support for zipped jpeg's (#938)
  • Fuzzing harnesses for integration into Google's OSS-Fuzz (949)
  • Support for setuptools-git-versioning version 2.0.0 (#957)

Fixed

  • Resolving mediabox and pdffont (#834)
  • Keywords that aren't terminated by the pattern END_KEYWORD before end-of-stream are parsed (#885)
  • ValueError wrong error message when specifying codec for text output (#902)
  • Resolve stream filter parameters (#906)
  • Reading cmap's with whitespace in the name (#935)
  • Optimize apply_png_predictor by using lists (#912)

Changed

  • Updated Python 3.7 syntax to 3.8 (#956)
  • Updated all Python version specifications to a minimum of 3.8 (#969)

20231228

28 Dec 21:25
Compare
Choose a tag to compare

Added

  • Output converter for the hOCR format (#651)
  • Font name aliases for Arial, Courier New and Times New Roman (#790)
  • Documentation on why special characters can sometimes not be extracted (#829)
  • Storing Bezier path and dashing style of line in LTCurve (#801)

Fixed

  • Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools (#921)
  • flake8 failures (#921)
  • ValueError when bmp images with 1 bit channel are decoded (#773)
  • ValueError when trying to decrypt empty metadata values (#766)
  • Sphinx errors during building of documentation (#760)
  • TypeError when getting default width of font (#720)
  • Installing typing-extensions on Python 3.6 and 3.7 (#775)
  • TypeError in cmapdb.py when parsing null characters (#768)
  • Color "convenience operators" now (per spec) also set color space (#794)
  • ValueError when extracting images, due to breaking changes in Pillow (#827)
  • Small typo's and issues in the documentation (#828)
  • Ignore non-Unicode cmaps in TrueType fonts (#806)

Changed

  • Using non-hardcoded version string and setuptools-git-versioning to enable installation from source and building on Python 3.12 (#922)

Deprecated

  • Usage of if __name__ == "__main__" where it was only intended for testing purposes (#756)

Removed

  • Support for Python 3.6 and 3.7 because they are end-of-life (#923)

20221105

05 Nov 16:33
ebf7bcd
Compare
Choose a tag to compare

Added

  • Output converter for the hOCR format (#651)
  • Font name aliases for Arial, Courier New and Times New Roman (#790)
  • Documentation on why special characters can sometimes not be extracted (#829)

Fixed

  • ValueError when bmp images with 1 bit channel are decoded (#773)
  • ValueError when trying to decrypt empty metadata values (#766)
  • Sphinx errors during building of documentation (#760)
  • TypeError when getting default width of font (#720)
  • Installing typing-extensions on Python 3.6 and 3.7 (#775)
  • TypeError in cmapdb.py when parsing null characters (#768)
  • Color "convenience operators" now (per spec) also set color space (#794)
  • ValueError when extracting images, due to breaking changes in Pillow (#827)
  • Small typo's and issues in the documentation (#828)

Deprecated

  • Usage of if __name__ == "__main__" where it was only intended for testing purposes (#756)

20220524

24 May 17:44
0b09d5f
Compare
Choose a tag to compare

Fixed

  • Ignoring (invalid) path constructors that do not begin with m (#749)

Changed

  • Removed upper version bounds (#755)

20220506

06 May 20:04
Compare
Choose a tag to compare

Fixed

  • IndexError when handling invalid bfrange code map in
    CMap (#731)
  • TypeError in lzw.py when self.table is not set (#732)
  • TypeError in encodingdb.py when name of unicode is not
    str (#733)
  • TypeError in HTMLConverter when using a bytes fontname (#734)

Added

  • Exporting images without any specific encoding (#737)

Changed

  • Using charset-normalizer instead of chardet for less restrictive license (#744)

20220319

19 Mar 20:13
Compare
Choose a tag to compare

Added

  • Export type annotations from pypi package per PEP561 (#679)
  • Support for identity cmap's (#626)
  • Add support for PDF page labels (#680)
  • Installation of Pillow as an optional extra dependency (#714)

Fixed

  • Hande decompression error due to CRC checksum error (#637)
  • Regression (since 20191107) in LTLayoutContainer.group_textboxes that returned some text lines out of order (#659)
  • Add handling of JPXDecode filter to enable extraction of images for some pdfs (#645)
  • Fix extraction of jbig2 files, which was producing invalid files (#652)
  • Crash in pdf2txt.py --boxes-flow=disabled (#682)
  • Only use xref fallback if PDFNoValidXRef is raised and fallback is True (#684)
  • Ignore empty characters when analyzing layout (#499)

Changed

  • Replace warnings.warn with logging.Logger.warning in line with recommended use (#673)
  • Switched from nose to pytest, from tox to nox and from Travis CI to GitHub Actions (#704)

Removed

  • Unnecessary return statements without argument at the end of functions (#707)

20211012

19 Mar 16:49
Compare
Choose a tag to compare

Added

  • Add support for PDF 2.0 (ISO 32000-2) AES-256 encryption (#614)
  • Support for Paeth PNG filter compression (predictor value = 4) (#537)
  • Type annotations (#661)

Fixed

  • KeyError when 'Encrypt' but not 'ID' present in trailer (#594)
  • Fix issue of ValueError and KeyError rasied in PDFdocument and PDFparser (#573)
  • Fix issue of TypeError: cannot unpack non-iterable PDFObjRef object, when unpacking the value of 'DW2' (#529)
  • Fix PermissionError when creating temporary filepaths on windows when running tests (#484)
  • Fix AttributeError when dumping a TOC with bytes destinations (#600)
  • Fix issue of some Chinese characters can not be extracted correctly (#593)
  • Detecting trailer correctly when surrounded with needless whitespace (#535)
  • Fix .paint_path logic for handling single line segments and extracting point-on-curve positions of Beziér path commands (#530)
  • Raising UnboundLocalError when a bad --output-type is used (#610)
  • TypeError when using TagExtractor with non-string or non-bytes tag values (#610)
  • Using io.TextIOBase as the file to write to (#616)
  • Parsing \r\n after the escape character in a literal string (#616)

Removed

  • Support for Python 3.4 and 3.5 (#522)
  • Unused dependency on sortedcontainers package (#525)
  • Support for non-standard output streams that are not binary (#523)
  • Dependency on typing-extensions introduced by #661 (#677)

20201018

18 Oct 11:09
Compare
Choose a tag to compare

Deprecated

  • Support for Python 3.4 and 3.5 (#503)

Added

  • Option to disable boxes flow layout analysis when using pdf2txt (#479)
  • Support for pathlib.PurePath in open_filename (#491)

Fixed

  • Pass caching parameter to PDFResourceManager in high_level functions (#475)
  • Fix .paint_path logic for handling non-rect quadrilaterals and decomposing complex paths (#473)
  • Fix out-of-bound access on some PDFs (#483)

Removed

  • Remove unused rijndael encryption implementation (#465)

20200726

30 Jul 06:57
Compare
Choose a tag to compare

Fixed

  • Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change (#461)
  • Always try to get CMap, not only for identity encodings (#438)

20200720

20 Jul 20:16
Compare
Choose a tag to compare

Added

  • Support for painting multiple rectangles at once (#371)

Fixed

  • Validate image object in do_EI is a PDFStream (#451)

Changed

  • Hiding fallback xref by default from dumppdf.py output (#431)
  • Raise a warning instead of an error when extracting text from a non-extractable PDF (#350)
  • Switched from pycryptodome to cryptography package for AES decryption (#456)