Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when "columns =" is specified, "pl.read_csv()" doesn't import columns based on the specified order of "columns = " #15027

Closed
2 tasks done
3SMMZRjWgS opened this issue Mar 13, 2024 · 5 comments
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars

Comments

@3SMMZRjWgS
Copy link

3SMMZRjWgS commented Mar 13, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

from io import StringIO
import polars as pl

df = pl.read_csv(
    source  = StringIO("id,y,x\n1,2,3"), 
    columns = ["id","x","y"],
)

Log output

# ┌─────┬─────┬─────┐
# │ id  ┆ y   ┆ x   │  << should be x, y (not y, x) ?
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 2   ┆ 3   │
# └─────┴─────┴─────┘

Issue description

When specifying columns in pl.read_csv(), the method doesn't load csv columns based on the specified order, instead, retains original column order from the csv. Is there anyway that this order can be specified by the columns = parameter?

Expected behavior

# ┌─────┬─────┬─────┐
# │ id  ┆ x   ┆ y   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 3   ┆ 2   │
# └─────┴─────┴─────┘

Installed versions

--------Version info---------
Polars:               0.20.14
Index type:           UInt32
Platform:             Windows-10-10.0.22631-SP0
Python:               3.11.8 | packaged by Anaconda, Inc. | (main, Feb 26 2024, 21:34:05) [MSC v.1916 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.10.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.3
numpy:                1.26.4
openpyxl:             3.0.10
pandas:               2.2.1
pyarrow:              15.0.0
pydantic:             2.5.3
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@3SMMZRjWgS 3SMMZRjWgS added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 13, 2024
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 13, 2024

The "columns" param doesn't actually say that it will select the columns in the given order; just that it will select them. So, I wouldn't say this is a bug (as the parameter does do what it says it will do), but it would probably be an improvement if it did what you expected. I'll tag this as an enhancement rather than a bug 🤔

@alexander-beedie alexander-beedie added enhancement New feature or an improvement of an existing feature and removed bug Something isn't working needs triage Awaiting prioritization by a maintainer labels Mar 13, 2024
@3SMMZRjWgS
Copy link
Author

3SMMZRjWgS commented Mar 13, 2024

Thank you so much for the quick response and willingness to include it as an enhancement, Alex! I agree, it's not technically a bug. I stumbled upon it as a recent polars user when i was doing pl.read_csv("path.csv", columns = ['a', 'b', 'c'], new_columns = ['x', 'y', 'z']), thinking that they'll match in order, then found out that the new header names and the read in columns mismatched.

@mcrumiller
Copy link
Contributor

Duplicate of #13066.

@mcrumiller
Copy link
Contributor

mcrumiller commented Mar 13, 2024

FYI there are a lot of CSV parameter issues, the python function needs a complete revamp IMO with some stricter requirements laid out. Off the top of my head (these are mostly my issues, I know there are others):

@stinodego
Copy link
Member

Closing in favor of #13066

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants