Skip to content

koheesio-v0.9.0

Compare
Choose a tag to compare
@dannymeijer dannymeijer released this 29 Nov 15:28
· 8 commits to main since this release

What's Changed

v0.9 brings many changes to the spark module, allowing support for pyspark connect along with a bunch of bug fixes and some new features. Additionally, the snowflake implementation is significantly reworked now relying on a pure python implementation for interacting with Snowflake outside of spark.

New features / Refactors

The following new features are included with 0.9:

Bug fixes

The following bugfixes are included with 0.9:

New Contributors

Big shout out to all contributors and a heartfelt welcome to our new contributors:

Migrating from v0.8

For users currently using v0.8, consider the following:

  • Spark connect is now fully supported. For this to work we've had to introduce several replacement types for pyspark such as DataFrame (i.e. pyspark.sql.DataFrame vs pyspark.sql.connect.DataFrame) as well as the SparkSession. If you are using custom Step logic in which you reference spark types, take these types from the koheesio.spark module instead. This will allow you to use pyspark connect with your custom code also.

  • Snowflake was extensively reworked.

    • To be able to use snowflake, a new extra / feature was added to the pyproject.toml - install this using koheesio[snowflake] in order to have access to snowflake python
    • Code for snowflake support was moved to new primary modules:
      • koheesio.integrations.spark.snowflake hosts all spark related snowflake code
      • koheesio.integrations.snowflake hosts the non-spark / pure-python implementations
      • The original API was kept in place through pass-through imports; no immediate code changes should be needed

Full Changelog: koheesio-v0.8.1...koheesio-v0.9.0