-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv
fails on schema
argument when columns
is also provided
#14227
Comments
If this is intended behavior, we should rename |
@mcrumiller In my opinion it should behave as you originally expected, i.e. only apply to the selected columns. I generally treat csv files as "external/out of my control" so I want to ensure that they contain at least the columns I need (defined by I also think this is inconsistent, I'm pretty sure |
You are right, this is confusing. "schema" usually means the complete file schema (see also pyspark) and will override/ignore the header and set the specified types in order. So a csv with a header "a,b,c" and a schema (c: type, b: type, a: type) will ignore(!) the csv header and set names (c,b,a) and types on the order they appear in the file. Determining the interaction between "schema" and other parameter like "columns", "new_columns" or "dtypes" should imo either be documented very precisely or NOT allowed 🚫! 🤓 |
There is no bug here. You should be using See #15431 (comment) Closing this one. |
Checks
Reproducible example
Log output
Issue description
When the
columns
parameter is specified, theschema
parameter ignores the dictionary and attempts to apply the values in the schema dictionary to the original columns.Expected behavior
Schema should only be applied to supplied columns.
Installed versions
The text was updated successfully, but these errors were encountered: