Skip to content

Releases: mjakubowski84/parquet4s

v2.14.1

12 Nov 13:28
Compare
Choose a tag to compare

This release fixes generic projection over a group using the group's multiple fields.

v2.14.0

09 Nov 19:19
Compare
Choose a tag to compare

Version 2.14.0 brings a revolution to Parquet4s led mostly by @utkuaydn and @j-madden:

  • Parquet4s now supports both Akka and Pekko 🥳
  • Upgrade to Scala 2.13.12 and 3.3.1
  • Upgrade of SBT to 1.9.x and building project using sbt-projectmatrix
  • Supporting legacy pyarrow lists in file reads

Big thanks to the contributors!

v2.13.0

23 Sep 13:24
Compare
Choose a tag to compare

Here it is! Proper support for Protobuf! In Scala way!
Thanks to @huajiang-tubi Parquet4S has a new module that allows reading and writing Parquet to and from Protobuf. It leverages ScalaPB so that you can use Scala case classes for the model in your Scala projects. And it is very easy to use! Please refer to the documentation for more details.

Other notable changes:

  • Each module now has a custom function in the API for reading and writing Parquet using your custom internals
  • InMemoryOutput file becomes reusable
  • FS2 updated to 3.9.2
  • SLF4J updated to 2.0.9

Big thanks to @huajiang-tubi for his contributions!

v2.12.0

07 Aug 17:58
Compare
Choose a tag to compare

This is a release that bring in many changes!

  1. Support for reading from and writing to abstract data interfaces.
    Together with @huajiang-tubi we added the ability to read from org.apache.parquet.io.InputFile and writing to org.apache.parquet.io.OutputFile. Additionally, @huajiang-tubi implemented InMemoryInputfile and InMemoryOutputFile for reading and writing Parquet files from/to bytes. All modules received the new functionality. New API functions are defined as alternatives to existing end-steps in builders with Path. Please mind that the new API is still marked as experimental, that is, it might be a subject to change in the subsequent minor releases.

  2. Every module, including core, will try to read partitions. Prior 2.12.0, when using core module, one needed to explicitly use ParquetReader.as[MyData].partitioned... in order to scan a partitioned directory. It was designed so, in order to avoid I/O operations in low-level module, when one was sure that they are reading a single file. However, the underlying Parquet library still was doing an attempt to scan the directory. With this release, this behaviour changes. In order to support InputFile, we needed to replace existing Parquet abstraction with a more low-level code. This allowed us to enrich existing code and execute partition discovery with existing directory scanning. In effect, the experience regarding reading partitions is consistent across all modules!
    Therefore, ParquetReader.as[MyData].partitioned is now marked as deprecated and has no real effect.

  3. Scala 3 is upgraded to 3.3.0 LTS version.

  4. Various minor dependency updates.

  5. A more strict and consistent code linting (thanks to the update to Scala 3.3.0).

Introduced changes enable multiple new opportunities! Stay tuned, as quite soon there will be more new features soon!

v2.11.1

02 Jun 08:06
Compare
Choose a tag to compare

Additional logs in FS2 module for debugging Hadoop connectivity issues while reading Parquet files by @jeet23.

Minor dependency updates, most notably:

  • parquet-hadoop: 1.13.0 -> 1.13.1
  • fs2-core: 3.6.1 -> 3.7.0
  • cats-effect: 3.4.9 -> 3.5.0

v2.11.0

23 Apr 19:09
Compare
Choose a tag to compare

Fixes in filter predicate so that negation of in returns proper results.

Updates parquet-hadoop to 1.13.0.
Updates Scala 3 to 3.2.2
Other minor dependency updates.

v2.10.1

02 Apr 18:25
Compare
Choose a tag to compare

Added support for reading and writing various legacy list formats. Please note that you still need to provide a schema to read a legacy list with projection and to write it. Parquet4s supports only the latest list format out of the box for those purposes.

Big thanks to @txdv for the contribution.

Additionally, the release includes minor dependency upgrades.

v2.10.0

09 Feb 19:31
Compare
Choose a tag to compare

This release introduces a new custom function for writing a single file of Parquet in Akka and FS2. By using custom, one can provide own instance of org.apache.parquet.hadoop.ParquetWriter.Builder. That gives the freedom to configure a writer for custom needs, including writing data encoded in Protobuf of Thrift. The API is marked as experimental and can be subject to change in the following releases.
Great thanks to @flipp5b for implementing the new feature.

v2.9.0

28 Jan 12:27
Compare
Choose a tag to compare

Release 2.9.0 brings the following improvements and bug fixes:

  1. Fixes time zone offset calculation for unconventional timezones like Europe/Warsaw, which Java tends to interpret as WMT +1:24:00.
  2. From now on, viaParquet in Akka and FS2 modules properly reacts to the OVERWRITE write mode configuration. While the CREATE mode stays unchanged and Parquet4s adds new files to the directory, the OVERWRITE mode recreates the directory before writing.
    Please be cautious when using this mode! All files in the directory are deleted!
    CREATE mode remains a default mode.
  3. Minor dependency updates.

v2.8.0

18 Jan 19:30
Compare
Choose a tag to compare

The release fixes an issue with the handling of the Hadoop file system. Prior to 2.8.0, Parquet4s was closing the file system when it didn't need it anymore. However, as Hadoop caches file systems, when a program accessed HDFS again, the closed file system was retrieved. Of course, any operation on such a file system would fail.
Thanks to @flipp5b, the issue is fixed, and Parquet4s doesn't close file systems anymore.