You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
arrow2::io::json::read::infer_records_schema function reads the first element of the Json array and infers the Schema based on it. This can give faulty Schema if not all values of Json array contain all the fields. In essence, infer_records_schema function assumes the input Json to already have a structure of arrow2::chunk::Chunk
Potential Solution
Rather than reading just the first element, infer_records_schema should read all the elements, coerce Schemas of all those elements. This is how infer function creates the DataType (see this)
Issue Example
// jsonlet json = [{a:0, c:"hello"},{a:1, b:false, c:"hello"},{a:2, c:"world", d:3.14}]// schemalet schema = io::json::read::infer_records_schema(&json).unwrap();println!("schema:#?");// ⚠️ Schema has fields corresponding to only the first value in Json array//// Schema {// fields: [// Field {// name: "a",// data_type: Int64,// is_nullable: true,// metadata: {},// },// Field {// name: "c",// data_type: Utf8,// is_nullable: true,// metadata: {},// },// ],// metadata: {},// }// datatypelet data_type = io::json::read::infer(&json).unwrap()println!("data_type:#?");// ⚠️ DataType has fields corresponding to all values in Json array//// List(// Field {// name: "item",// data_type: Struct(// [// Field {// name: "a",// data_type: Int64,// is_nullable: true,// metadata: {},// },// Field {// name: "c",// data_type: Utf8,// is_nullable: true,// metadata: {},// },// Field {// name: "b",// data_type: Boolean,// is_nullable: true,// metadata: {},// },// Field {// name: "d",// data_type: Float64,// is_nullable: true,// metadata: {},// },// ],// ),// is_nullable: true,// metadata: {},// },// )
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Issue
arrow2::io::json::read::infer_records_schema function reads the first element of the Json array and infers the
Schema
based on it. This can give faultySchema
if not all values of Json array contain all the fields. In essence,infer_records_schema
function assumes the input Json to already have a structure of arrow2::chunk::ChunkPotential Solution
Rather than reading just the first element,
infer_records_schema
should read all the elements, coerceSchema
s of all those elements. This is howinfer
function creates theDataType
(see this)Issue Example
The text was updated successfully, but these errors were encountered: