Releases: mjakubowski84/parquet4s
v2.14.1
This release fixes generic projection over a group using the group's multiple fields.
v2.14.0
Version 2.14.0 brings a revolution to Parquet4s led mostly by @utkuaydn and @j-madden:
- Parquet4s now supports both Akka and Pekko 🥳
- Upgrade to Scala 2.13.12 and 3.3.1
- Upgrade of SBT to 1.9.x and building project using sbt-projectmatrix
- Supporting legacy pyarrow lists in file reads
Big thanks to the contributors!
v2.13.0
Here it is! Proper support for Protobuf! In Scala way!
Thanks to @huajiang-tubi Parquet4S has a new module that allows reading and writing Parquet to and from Protobuf. It leverages ScalaPB so that you can use Scala case classes for the model in your Scala projects. And it is very easy to use! Please refer to the documentation for more details.
Other notable changes:
- Each module now has a
custom
function in the API for reading and writing Parquet using your custom internals - InMemoryOutput file becomes reusable
- FS2 updated to 3.9.2
- SLF4J updated to 2.0.9
Big thanks to @huajiang-tubi for his contributions!
v2.12.0
This is a release that bring in many changes!
-
Support for reading from and writing to abstract data interfaces.
Together with @huajiang-tubi we added the ability to read fromorg.apache.parquet.io.InputFile
and writing toorg.apache.parquet.io.OutputFile
. Additionally, @huajiang-tubi implementedInMemoryInputfile
andInMemoryOutputFile
for reading and writing Parquet files from/to bytes. All modules received the new functionality. New API functions are defined as alternatives to existing end-steps in builders withPath
. Please mind that the new API is still marked as experimental, that is, it might be a subject to change in the subsequent minor releases. -
Every module, including core, will try to read partitions. Prior 2.12.0, when using core module, one needed to explicitly use
ParquetReader.as[MyData].partitioned...
in order to scan a partitioned directory. It was designed so, in order to avoid I/O operations in low-level module, when one was sure that they are reading a single file. However, the underlying Parquet library still was doing an attempt to scan the directory. With this release, this behaviour changes. In order to supportInputFile
, we needed to replace existing Parquet abstraction with a more low-level code. This allowed us to enrich existing code and execute partition discovery with existing directory scanning. In effect, the experience regarding reading partitions is consistent across all modules!
Therefore,ParquetReader.as[MyData].partitioned
is now marked as deprecated and has no real effect. -
Scala 3 is upgraded to 3.3.0 LTS version.
-
Various minor dependency updates.
-
A more strict and consistent code linting (thanks to the update to Scala 3.3.0).
Introduced changes enable multiple new opportunities! Stay tuned, as quite soon there will be more new features soon!
v2.11.1
v2.11.0
Fixes in
filter predicate so that negation of in returns proper results.
Updates parquet-hadoop to 1.13.0.
Updates Scala 3 to 3.2.2
Other minor dependency updates.
v2.10.1
Added support for reading and writing various legacy list formats. Please note that you still need to provide a schema to read a legacy list with projection and to write it. Parquet4s supports only the latest list format out of the box for those purposes.
Big thanks to @txdv for the contribution.
Additionally, the release includes minor dependency upgrades.
v2.10.0
This release introduces a new custom
function for writing a single file of Parquet in Akka and FS2. By using custom
, one can provide own instance of org.apache.parquet.hadoop.ParquetWriter.Builder
. That gives the freedom to configure a writer for custom needs, including writing data encoded in Protobuf of Thrift. The API is marked as experimental and can be subject to change in the following releases.
Great thanks to @flipp5b for implementing the new feature.
v2.9.0
Release 2.9.0 brings the following improvements and bug fixes:
- Fixes time zone offset calculation for unconventional timezones like
Europe/Warsaw
, which Java tends to interpret asWMT +1:24:00
. - From now on,
viaParquet
in Akka and FS2 modules properly reacts to theOVERWRITE
write mode configuration. While theCREATE
mode stays unchanged and Parquet4s adds new files to the directory, theOVERWRITE
mode recreates the directory before writing.
Please be cautious when using this mode! All files in the directory are deleted!
CREATE
mode remains a default mode. - Minor dependency updates.
v2.8.0
The release fixes an issue with the handling of the Hadoop file system. Prior to 2.8.0, Parquet4s was closing the file system when it didn't need it anymore. However, as Hadoop caches file systems, when a program accessed HDFS again, the closed file system was retrieved. Of course, any operation on such a file system would fail.
Thanks to @flipp5b, the issue is fixed, and Parquet4s doesn't close file systems anymore.