Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added check to JsonReader::finish that infer_schema_len was not Some(0) #10574

Closed
wants to merge 3 commits into from

Conversation

rben01
Copy link
Contributor

@rben01 rben01 commented Aug 17, 2023

(returns PolarsError::InvalidOperation if it is)
Added test that this case is caught correctly
Updated docs of JsonReader::infer_schema_len(...) to state that it applies to both JsonFormat::JsonLines and ::Json Clarified other error message in JsonReader::finish

(returns PolarsError::InvalidOperation if it is)
Added test that this case is caught correctly
Updated docs of JsonReader::infer_schema_len(...) to state that it applies to both JsonFormat::JsonLines and ::Json
Clarified other error message in JsonReader::finish
@rben01
Copy link
Contributor Author

rben01 commented Aug 17, 2023

I should mention that this fixes a panic that occurred when infer_schema_len was Some(0); the panic is now an error.

/// Set the JSON reader to infer the schema of the file. Currently, this is only used when reading from
/// [`JsonFormat::JsonLines`], as [`JsonFormat::Json`] reads in the entire array anyway.
/// Set the number of records for the JSON reader to infer the schema of the file
/// from.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your edit the "Currently, this is only used when reading from [JsonFormat::JsonLines], as [JsonFormat::Json] reads in the entire array anyway." got lost. Can you put that back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It changed — both formats now use infer_schema_len. JsonFormat::Json will read in the entire array, but can inspect a subset of records for schema inference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, just following up on this. The removal of that part of the comment was intentional because the removed content was outdated. Now, both JsonFormat::JsonLines and JsonFormat::Json can use infer_schema_len to check only a subset of records when inferring the schema. (Not sure when it changed.)

@rben01 rben01 deleted the branch pola-rs:main August 25, 2023 03:52
@rben01 rben01 closed this Aug 25, 2023
@rben01 rben01 deleted the main branch August 25, 2023 03:52
@rben01
Copy link
Contributor Author

rben01 commented Aug 29, 2023

Sorry, I accidentally closed this when I mucked with some branches. Should I open a separate PR?

@orlp
Copy link
Collaborator

orlp commented Aug 29, 2023

I can't reopen this PR because 'the branch was force-pushed or recreated', so you'll have to open a new PR I'm afraid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants