-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs on how to transition to using dask-expr -- given that it's now the default for dataframes in the new dask release #968
Comments
Hi, thanks for reaching out. Do you have tracebacks that we could look at? One thing that's incorrect on first glance:
That's a private import and changed for dask-expr |
Can you see https://app.circleci.com/pipelines/github/DAGWorks-Inc/hamilton/2705/workflows/eef49863-3c50-4ff2-a68c-ced576de8385/jobs/45410 ? Yep, I entirely think it could be me using internal APIs -- hence docs on what changed would be useful. |
I’m experiencing issues perhaps related to the same issue. With dask-expr installed, the issue happens with release 2024.3.0 and later. I am using dask.dataframe.read_parquet to read from a directory on S3 that contains multiple parquet files. I am running on Python 3.10.14 [GCC 10.2.1 20210110] on a linux machine (and see this happen with Python 3.11 as well). If I include dask.config.set({"dataframe.query-planning": False}) in my code, the issue goes away, but not a good long term solution as I presume query-planning will eventually be forced. I am running the following:
Which throws the following on the ddf.compute() call:
It seems that perhaps the issue might be with file statistics somehow?. I tried a few options of calling dd.read_parquet differently but no luck. Thanks in advance for any help |
This was indeed a bug, put up a pr to fix |
Describe the issue:
Creating dataframes without dask-expr works, but now with it being the default in the latest release it fails. I don't see migration/documentation on what the behavior changes are.
Minimal Complete Verifiable Example:
Run https://github.com/DAGWorks-Inc/hamilton/tree/main/plugin_tests/h_dask with the latest libraries (change conftest.py to use dask-expr query planning).
The code that is failing is this class https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/plugins/h_dask.py#L175 -- which has some custom stuff and assumptions about how dask used to behave.
When I set:
Everything works -- so I'm convinced it's dask-expr that's the issue.
Anything else we need to know?:
I posted https://dask.discourse.group/t/what-changed-in-the-latest-release-with-the-default-to-use-dask-expr/2597 too.
Environment:
The text was updated successfully, but these errors were encountered: