Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(python): Clarify documentation for schema in read_csv function #18759

Merged
merged 1 commit into from
Sep 22, 2024

Conversation

bradfordlynch
Copy link
Contributor

The read_csv function expects that the order of the columns in a provided schema match the order of the columns in the CSV file being read. This was not documented and led to unexpected behavior. Since dict types do not guarantee the ordering of their keys this requirement was surprising to me.

This issue discusses the fact that this happens but no fix has been implemented. Improving the documentation is quick and will enable users from experiencing the issue if they are willing to read.

from io import StringIO

import polars as pl

csv = """A,B
1,"foo"
3,"bar"
"""

buf = StringIO(csv)

# Works fine
schema_good = {"A": pl.Int64, "B": pl.String}
pl.read_csv(buf, schema=schema_good)

# Raises ComputeError
buf.seek(0)
schema_bad = {"B": pl.String, "A": pl.Int64}
pl.read_csv(buf, schema=schema_bad)

The `read_csv` function expects that the order of the columns in a provided schema match the order of the columns in the CSV file being read. This was not documented and led to unexpected behavior.
@github-actions github-actions bot added documentation Improvements or additions to documentation python Related to Python Polars labels Sep 15, 2024
@bradfordlynch
Copy link
Contributor Author

I know that the contributing guidelines ask for PRs to be linked to specific issues. I tried to do that but I do not appear to have sufficient privileges to do so.

@bradfordlynch bradfordlynch mentioned this pull request Sep 15, 2024
2 tasks
@alippai
Copy link

alippai commented Sep 15, 2024

Just a nit: dict keys are guaranteed to be in insertion order since python 3.7. It’s fully deterministic

@mcrumiller
Copy link
Contributor

I know that the contributing guidelines ask for PRs to be linked to specific issues. I tried to do that but I do not appear to have sufficient privileges to do so.

Just mention the issue number in your PR description and it'll automatically link. You can read the github docs for more information.

@ritchie46 ritchie46 merged commit f08885c into pola-rs:main Sep 22, 2024
15 checks passed
Copy link

codecov bot commented Sep 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.85%. Comparing base (4894e24) to head (563fcfb).
Report is 33 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #18759   +/-   ##
=======================================
  Coverage   79.85%   79.85%           
=======================================
  Files        1517     1517           
  Lines      205530   205530           
  Branches     2892     2892           
=======================================
+ Hits       164119   164126    +7     
+ Misses      40863    40856    -7     
  Partials      548      548           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bradfordlynch bradfordlynch deleted the read_csv-docs branch September 23, 2024 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants