Releases · mjakubowski84/parquet4s

12 Nov 13:28

mjakubowski84

v2.14.1

8ab1270

v2.14.1

This release fixes generic projection over a group using the group's multiple fields.

Assets 2

09 Nov 19:19

mjakubowski84

v2.14.0

503a911

v2.14.0

Version 2.14.0 brings a revolution to Parquet4s led mostly by @utkuaydn and @j-madden:

Parquet4s now supports both Akka and Pekko 🥳
Upgrade to Scala 2.13.12 and 3.3.1
Upgrade of SBT to 1.9.x and building project using sbt-projectmatrix
Supporting legacy pyarrow lists in file reads

Big thanks to the contributors!

Contributors

utkuaydn and j-madden

Assets 2

23 Sep 13:24

mjakubowski84

v2.13.0

56e82c2

v2.13.0

Here it is! Proper support for Protobuf! In Scala way!
Thanks to @huajiang-tubi Parquet4S has a new module that allows reading and writing Parquet to and from Protobuf. It leverages ScalaPB so that you can use Scala case classes for the model in your Scala projects. And it is very easy to use! Please refer to the documentation for more details.

Other notable changes:

Each module now has a custom function in the API for reading and writing Parquet using your custom internals
InMemoryOutput file becomes reusable
FS2 updated to 3.9.2
SLF4J updated to 2.0.9

Big thanks to @huajiang-tubi for his contributions!

Contributors

huajiang-tubi

Assets 2

07 Aug 17:58

mjakubowski84

v2.12.0

be841ab

v2.12.0

This is a release that bring in many changes!

Support for reading from and writing to abstract data interfaces.
Together with @huajiang-tubi we added the ability to read from org.apache.parquet.io.InputFile and writing to org.apache.parquet.io.OutputFile. Additionally, @huajiang-tubi implemented InMemoryInputfile and InMemoryOutputFile for reading and writing Parquet files from/to bytes. All modules received the new functionality. New API functions are defined as alternatives to existing end-steps in builders with Path. Please mind that the new API is still marked as experimental, that is, it might be a subject to change in the subsequent minor releases.
Every module, including core, will try to read partitions. Prior 2.12.0, when using core module, one needed to explicitly use ParquetReader.as[MyData].partitioned... in order to scan a partitioned directory. It was designed so, in order to avoid I/O operations in low-level module, when one was sure that they are reading a single file. However, the underlying Parquet library still was doing an attempt to scan the directory. With this release, this behaviour changes. In order to support InputFile, we needed to replace existing Parquet abstraction with a more low-level code. This allowed us to enrich existing code and execute partition discovery with existing directory scanning. In effect, the experience regarding reading partitions is consistent across all modules!
Therefore, ParquetReader.as[MyData].partitioned is now marked as deprecated and has no real effect.
Scala 3 is upgraded to 3.3.0 LTS version.
Various minor dependency updates.
A more strict and consistent code linting (thanks to the update to Scala 3.3.0).

Introduced changes enable multiple new opportunities! Stay tuned, as quite soon there will be more new features soon!

Contributors

huajiang-tubi

Assets 2

02 Jun 08:06

mjakubowski84

v2.11.1

f41ff6d

v2.11.1

Additional logs in FS2 module for debugging Hadoop connectivity issues while reading Parquet files by @jeet23.

Minor dependency updates, most notably:

parquet-hadoop: 1.13.0 -> 1.13.1
fs2-core: 3.6.1 -> 3.7.0
cats-effect: 3.4.9 -> 3.5.0

Contributors

jeet23

Assets 2

23 Apr 19:09

mjakubowski84

v2.11.0

d8e52df

v2.11.0

Fixes in filter predicate so that negation of in returns proper results.

Updates parquet-hadoop to 1.13.0.
Updates Scala 3 to 3.2.2
Other minor dependency updates.

Assets 2

02 Apr 18:25

mjakubowski84

v2.10.1

3e2e70b

v2.10.1

Added support for reading and writing various legacy list formats. Please note that you still need to provide a schema to read a legacy list with projection and to write it. Parquet4s supports only the latest list format out of the box for those purposes.

Big thanks to @txdv for the contribution.

Additionally, the release includes minor dependency upgrades.

Contributors

txdv

Assets 2

09 Feb 19:31

mjakubowski84

v2.10.0

47f8819

v2.10.0

This release introduces a new custom function for writing a single file of Parquet in Akka and FS2. By using custom, one can provide own instance of org.apache.parquet.hadoop.ParquetWriter.Builder. That gives the freedom to configure a writer for custom needs, including writing data encoded in Protobuf of Thrift. The API is marked as experimental and can be subject to change in the following releases.
Great thanks to @flipp5b for implementing the new feature.

Contributors

flipp5b

Assets 2

28 Jan 12:27

mjakubowski84

v2.9.0

eb8ddfe

v2.9.0

Release 2.9.0 brings the following improvements and bug fixes:

Fixes time zone offset calculation for unconventional timezones like Europe/Warsaw, which Java tends to interpret as WMT +1:24:00.
From now on, viaParquet in Akka and FS2 modules properly reacts to the OVERWRITE write mode configuration. While the CREATE mode stays unchanged and Parquet4s adds new files to the directory, the OVERWRITE mode recreates the directory before writing.
Please be cautious when using this mode! All files in the directory are deleted!
CREATE mode remains a default mode.
Minor dependency updates.

Assets 2

18 Jan 19:30

mjakubowski84

v2.8.0

dca2356

v2.8.0

The release fixes an issue with the handling of the Hadoop file system. Prior to 2.8.0, Parquet4s was closing the file system when it didn't need it anymore. However, as Hadoop caches file systems, when a program accessed HDFS again, the closed file system was retrieved. Of course, any operation on such a file system would fail.
Thanks to @flipp5b, the issue is fixed, and Parquet4s doesn't close file systems anymore.

Contributors

flipp5b

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Contributors

Contributors

Contributors

Contributors

Contributors

Contributors

Releases: mjakubowski84/parquet4s

v2.14.1

v2.14.0

Contributors

v2.13.0

Contributors

v2.12.0

Contributors

v2.11.1

Contributors

v2.11.0

v2.10.1

Contributors

v2.10.0

Contributors

v2.9.0

v2.8.0

Contributors