Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for required fields #41

Open
charlyraffellini opened this issue Sep 3, 2019 · 1 comment
Open

Adding support for required fields #41

charlyraffellini opened this issue Sep 3, 2019 · 1 comment

Comments

@charlyraffellini
Copy link

Since JSON schema draft 4 object definition supports a required property. This property is an array of fields that should be present in the object instance.

Fields that are not required can be undefined. However, Spark schemas don't support undefined columns. At the moment not-nullable and no-required fields are marked as not-nullable in the generated Spark schema.

JSON Schema:

       {
         "type": "object",
         "required": [],
         "properties": {
           "size": {
             "type": "integer"
           }
         }
       }

The current Spark schema being generated (serialized as JSON):

{
  "type" : "struct",
  "fields" : [ {
    "name" : "size",
    "type" : "long",
    "nullable" : false,
    "metadata" : { }
  } ]
}

Proposed Spark schema (serialized as JSON):

{
  "type" : "struct",
  "fields" : [ {
    "name" : "size",
    "type" : "long",
    "nullable" : true,
    "metadata" : { }
  } ]
}
@hesserp
Copy link
Contributor

hesserp commented Dec 1, 2020

Hi @charlyraffellini, my main concern is, that this is not backwards compatible, e.g. if one used a schema using the types to define if something is nullable and not using the required field, then updating the version will make all the fields nullable that were not supposed to be nullable before. On the other hand I understand, that using the required field is rather the convention here and might be understood more easily.

So if having such a breaking change, what about ignoring the null type completely and just using required. Both of them just serve the purpose of deciding nullable or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants