Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars 1.3.0 throws parquet errors when reading (I think) categorical columns in settings that 1.1.0 does not. #17931

Closed
2 tasks done
mmcdermott opened this issue Jul 29, 2024 · 1 comment · Fixed by #17941
Assignees
Labels
A-io-parquet Area: reading/writing Parquet files accepted Ready for implementation bug Something isn't working P-medium Priority: medium python Related to Python Polars

Comments

@mmcdermott
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

I don't have a MWE at the moment, but wanted to post this in the interim so others can find it.

I have a confirmed test case in a public repo I'm working on that I'll implement below where running code with polars 1.1.0 works fine and with 1.3.0 raises this error:

polars.exceptions.ComputeError: parquet: Not yet supported: Dictionary array without a dictionary page

The test case is this test file in this repo: https://github.com/mmcdermott/MEDS_transforms/blob/397916bc4941c270b205e8fbf5307aa407d5b56b/tests/test_extract.py

Here is an example of the test failing due to polars 1.3.0: https://github.com/mmcdermott/MEDS_transforms/actions/runs/10146769727/job/28055619925?pr=31

Log output

No response

Issue description

I suspect this has to due with categorical columns but I'm not sure.

Expected behavior

It should work the same way on 1.3.0 as 1.1.0

Installed versions

--------Version info---------
Polars:               1.1.0
Index type:           UInt32
Platform:             Linux-5.15.0-117-generic-x86_64-with-glibc2.35
Python:               3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:23:07) [GCC 12.3.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                2.0.1
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              17.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>```

</details>
@mmcdermott mmcdermott added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jul 29, 2024
@alexander-beedie alexander-beedie added the A-io-parquet Area: reading/writing Parquet files label Jul 29, 2024
@ritchie46
Copy link
Member

@coastalwhite

@ritchie46 ritchie46 added P-high Priority: high and removed needs triage Awaiting prioritization by a maintainer labels Jul 30, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jul 30, 2024
@coastalwhite coastalwhite added P-medium Priority: medium and removed P-high Priority: high labels Jul 30, 2024
@coastalwhite coastalwhite self-assigned this Jul 30, 2024
@coastalwhite coastalwhite linked a pull request Jul 31, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from Ready to Done in Backlog Jul 31, 2024
@c-peters c-peters added the accepted Ready for implementation label Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-parquet Area: reading/writing Parquet files accepted Ready for implementation bug Something isn't working P-medium Priority: medium python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants