-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Column With all Null values is dropped #7858
Comments
This seems to be a problem with arrow2 itself rather than polars. |
I've noticed the same problem occurs for data where the column is mostly null, or mostly an empty list, e.g. [
{"id": 1, "vals": []},
{"id": 2, "vals": []},
...
{"id": 99, "vals": ["not empty"]},
] pl.read_json("example.json") # ==> doesn't have a `vals` column |
Encountered this in 0.19.6 too |
Has anyone found a workaround here, besides manually writing out the schema ahead of time? |
Encountered this as well |
I can confirm this happens in Polars 0.19.15 and it ignores empty arrays as well as nulls (but not empty objects). I encountered this when calling Test case: import polars as pl
import io
data = io.BytesIO(b'''\
{"id": 1, "zero_column": 0, "empty_array_column": [], "empty_object_column": {}, "null_column": null}
{"id": 2, "zero_column": 0, "empty_array_column": [], "empty_object_column": {}, "null_column": null}
{"id": 3, "zero_column": 0, "empty_array_column": [], "empty_object_column": {}, "null_column": null}
{"id": 4, "zero_column": 0, "empty_array_column": [], "empty_object_column": {}, "null_column": null}
''')
df = pl.read_ndjson(data)
print(df)
# shape: (4, 3)
# ┌─────┬─────────────┬─────────────────────┐
# │ id ┆ zero_column ┆ empty_object_column │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ struct[1] │
# ╞═════╪═════════════╪═════════════════════╡
# │ 1 ┆ 0 ┆ {null} │
# │ 2 ┆ 0 ┆ {null} │
# │ 3 ┆ 0 ┆ {null} │
# │ 4 ┆ 0 ┆ {null} │
# └─────┴─────────────┴─────────────────────┘ Platform details:
Pandas (2.1.3) does not drop the columns and doesn't cause trouble in import pandas as pd
import io
data = io.BytesIO(b'''\
{"id": 1, "zero_column": 0, "empty_array_column": [], "empty_object_column": {}, "null_column": null}
{"id": 2, "zero_column": 0, "empty_array_column": [], "empty_object_column": {}, "null_column": null}
{"id": 3, "zero_column": 0, "empty_array_column": [], "empty_object_column": {}, "null_column": null}
{"id": 4, "zero_column": 0, "empty_array_column": [], "empty_object_column": {}, "null_column": null}
''')
df = pd.read_json(data, lines=True)
print(df.to_string())
# id zero_column empty_array_column empty_object_column null_column
# 0 1 0 [] {} NaN
# 1 2 0 [] {} NaN
# 2 3 0 [] {} NaN
# 3 4 0 [] {} NaN |
@tkarabela I don't think the empty arrays issue has been reported before. I'm not sure if you want to post your comment as a separate issue. It seems like a better example which could supercede this issue and #11860 (both of which have gone a bit stale). |
@cmdlineluser I'm currently investigating the root cause in arrow2 library - so far it looks that the issue is really there, and from polars' point of view the solution would be to upgrade to a version of arrow2 which doesn't have the bug. I'll try to put together a PR to arrow2 to fix this, then I'll post an issue here. |
@tkarabela Ah okay. The nulls is because of #11880 I'm also not sure if Polars uses arrow2 anymore. #11179 |
@cmdlineluser Thanks, I must have missed this and went straight into arrow2. I'll make an issue for the empty array problem then :) |
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Reading a JSON string into a dataframe will drop a column if all of the values are null. I expect to have a column with null values.
Reproducible example
Expected behavior
from io import BytesIO
import pandas as pd
json = BytesIO(bytes('''
[
{
"a": 1,
"b": null
}
]
''', 'UTF-8'))
df_pandas = pd.read_json(json)
print(df_pandas)
Installed versions
The text was updated successfully, but these errors were encountered: