Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_schema does not resolve $refs (and so does not actually check the schema) #854

Closed
skamansam opened this issue Oct 7, 2021 · 15 comments

Comments

@skamansam
Copy link

skamansam commented Oct 7, 2021

I am trying to develop a system where special users can create their own fragments of a json schema, and then the system will build the schema in order to validate config objects. The problem I am seeing is that the check_schema does not throw any errors when $refs are invalid, but the validate() method does throw them. IMHO, this means the check_schema does not fully check the schema.

Schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://server.com/fragments/schema.json",
  "type": "object",
  "title": "Test schema for fragment assembly",
  "description": "Test document for json schema fragment assembly",
  "default": {},
  "readOnly": true,
  "additionalProperties": false,
  "definitions": {},
  "properties": {
    "additionalProperties": false,
    "$schema": {
      "type": "string",
      "title": "Schema String",
      "description": "The location of the validating schema.",
      "default": "https://server.com/fragments/schema.json"
    },
    "fragments": {
      "$id": "/properties/fragments",
      "type": "object",
      "title": "fragments",
      "description": "fragments  of schemas",
      "default": {},
      "readOnly": true,
      "additionalProperties": false,
      "examples": [
        {
            "Fragment1": {},
            "Fragment2": {}
        }
      ],
      "properties": {
        "additionalProperties": false,
        "Fragment1": {   # these entries are uploaded by users, and are invalid due to invalid $refs
          "title": "Enrichment1",
          "type": "object",
          "properties": {
            "text": {
              "title": "Text",
              "description": "A description of this param",
              "type": "string"
            },
            "fixed_options": {
              "description": "This one has a default",
              "default": "option1",
              "allOf": [
                {
                  "$ref": "#/definitions/StringOptionSet1"
                }
              ]
            },
            "other_stuff": {
              "title": "Other Stuff",
              "description": "Optional and more complex",
              "type": "array",
              "items": {
                "type": "object"
              }
            }
          },
          "required": [
            "text"
          ],
          "definitions": {
            "StringOptionSet1": {
              "title": "StringOptionSet1",
              "description": "An enumeration.",
              "enum": [
                "option1",
                "option2"
              ],
              "type": "string"
            }
          }
        }
      }
    }
  }
}

Code:

In order to get validation to work, I need a config object. However, I want to know whether the schema is valid BEFORE I have a config object, as users will be uploading their own fragment validations, so I need to know if the whole document will validate BEFORE anyone uses the schema to validate anything. (The code below is pretty much a copy of validate().)

from jsonschema import validate
from jsonschema.validators import validator_for
config = {
    "fragments": {
        "Fragment1": {
            "text": "Lorem Ipsum One",
            "fixed_options": "option1",
            "other_stuff": [{"other1": "string", "other2": 2, "other3": [{"key": "value"}]}]
        },
        "Fragment2": {
            "text": "Lorem Ipsum One",
            "fixed_options": "option3",
            "other_stuff": [{"other1": "string", "other2": 2, "other3": [{"key": "value"}]}]
        }
    }
}

print('Detecting validator class for schema:')
validator_class = validator_for(schema_template)

print(f'Validating schema with {validator_class.__name__}, check_schema()')
valid = validator_class.check_schema(schema_template)
print('Schema is valid!')

print('Validating config with validate():')
validate(instance=config,schema=schema_template)
print("config is valid.")

Output

Detecting validator class for schema:
Validating schema with Draft202012Validator, check_schema()
Schema is valid!
Validating config with validate():
Traceback (most recent call last):
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 844, in resolve_fragment
    document = document[part]
KeyError: 'definitions'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/backup/sam/workspace/api-server/tests/jsonschema-pydantic-test.py", line 110, in <module>
    validate(instance=config,schema=schema_template)
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 965, in validate
    error = exceptions.best_match(validator.iter_errors(instance))
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/exceptions.py", line 357, in best_match
    best = max(itertools.chain([best], errors), key=key)
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 224, in iter_errors
    for error in errors:
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/_validators.py", line 329, in properties
    yield from validator.descend(
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 240, in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 224, in iter_errors
    for error in errors:
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/_validators.py", line 329, in properties
    yield from validator.descend(
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 240, in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 224, in iter_errors
    for error in errors:
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/_validators.py", line 329, in properties
    yield from validator.descend(
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 240, in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 224, in iter_errors
    for error in errors:
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/_validators.py", line 359, in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 240, in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 224, in iter_errors
    for error in errors:
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/_validators.py", line 291, in ref
    scope, resolved = validator.resolver.resolve(ref)
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 783, in resolve
    subschema = self.resolve_fragment(subschema, fragment)
  File "/home/sam/.pyenv/versions/3.9.7/lib/python3.9/site-packages/jsonschema/validators.py", line 846, in resolve_fragment
    raise exceptions.RefResolutionError(
jsonschema.exceptions.RefResolutionError: Unresolvable JSON pointer: 'definitions/StringOptionSet1'

EDIT: updated the code so my problem is more clear.

@Julian
Copy link
Member

Julian commented Oct 7, 2021

check_schema indeed essentially just means "is the schema valid under the metaschema".

There are even things in the specification which the metaschema doesn't cover but which mean the schema isn't valid according to the specification (and check_schema currently has no way of checking those either unfortunately).

But in your scenario -- what would you want check_schema to do if there were transient networking issues? Resolving some network reference may work the first time and fail the second.

@Julian
Copy link
Member

Julian commented Oct 7, 2021

For your specific case, though I haven't read super carefully, you may just want to try/except and if you get things other than a ValidationError you know the schema has an issue.

@skamansam
Copy link
Author

For your specific case, though I haven't read super carefully, you may just want to try/except and if you get things other than a ValidationError you know the schema has an issue.

The issue is that it's NOT raising an error when it should. It raises an error in other cases, such as malformed $ids, but not for unresolved $refs. My real question is why does it validate the $refs in validate(), but not in check_schema()?

@Julian
Copy link
Member

Julian commented Oct 7, 2021

The issue is that it's NOT raising an error when it should.

As I say, a try/except when you run validation will cover that.

why does it validate the $refs in validate(), but not in check_schema()

It doesn't validate the ref -- it's dereferencing it so that it can apply the validation rules at the thing that was referenced -- i.e. it needs to know what thing the ref refers to so it can use it.

And as for why it doesn't really help to do that ahead of time:

what would you want check_schema to do if there were transient networking issues? Resolving some network reference may work the first time and fail the second

@skamansam
Copy link
Author

The issue is that it's NOT raising an error when it should.

As I say, a try/except when you run validation will cover that.

A try/except is supposed to catch errors when they happen. My issue is that there are no errors to catch, but there should be.

why does it validate the $refs in validate(), but not in check_schema()

It doesn't validate the ref -- it's dereferencing it so that it can apply the validation rules at the thing that was referenced -- i.e. it needs to know what thing the ref refers to so it can use it.

So why doesn't it do this in check_schema()?

And as for why it doesn't really help to do that ahead of time:

what would you want check_schema to do if there were transient networking issues? Resolving some network reference may work the first time and fail the second

This is exactly the same issue as if it were checked only once - what happens if there are transient network errors when validate() is called? It may not resolve then, either.

@Julian
Copy link
Member

Julian commented Oct 7, 2021

An exception occurs at validation time (as opposed to when you ran check_schema) -- you pasted it in this ticket. Catch that in a try/except.

This is exactly the same issue as if it were checked only once - what happens if there are transient network errors when validate() is called? It may not resolve then, either.

This is precisely my point.

@skamansam
Copy link
Author

I think I know your confusion now. In the stacktrace, it says this line generates the error:

error = schema_exceptions.best_match(validator.iter_errors(config))

However, it should be an error from 3 lines above it that reads:

valid = validator_class.check_schema(schema_template)

If you notice in the output, it says:

validating with <class 'jsonschema.validators.Draft202012Validator'>
Valid schema! None
Traceback (most recent call last):
 ...

The first two lines are the print() statements from my code. Line 2 should not be present, because it is happens AFTER calling check_schema.

@Julian
Copy link
Member

Julian commented Oct 7, 2021

I don't believe I'm confused -- I know what you're asking, I've explained why it's not well defined, and suggested what you can do instead.

@Julian
Copy link
Member

Julian commented Oct 7, 2021

The only other thing I can add is perhaps what you want is python-jsonschema/referencing#2 and to pre-resolve your schema -- a PR is welcome for that!

@skamansam
Copy link
Author

skamansam commented Oct 7, 2021

Here is my feature suggestion, then:

check_schema() should throw the same SchemaErrors as validate().

(I still don't understand why it currently does not, but that is beside the point.)

@Julian
Copy link
Member

Julian commented Oct 7, 2021

I've explained why I believe that isn't a guarantee check_schema can make (i.e. why today check_schema semantically means "the schema is valid under the metaschema" and is documented as such and does not guarantee "using it won't produce any exception other than ValidationError"). I'd encourage you to reread the above, specifically:

But in your scenario -- what would you want check_schema to do if there were transient networking issues? Resolving some network reference may work the first time and fail the second.

And to consider (or propose) how the guarantee you're proposing (or expecting) would be consistently made were it part of check_schema's goal.

You're welcome to ask follow-ups, but at the minute it seems we're talking over each other. I'm going to close this however, since I'm fairly confident it's not something that can be addressed other than via python-jsonschema/referencing#2.

@Julian Julian closed this as completed Oct 7, 2021
@skamansam
Copy link
Author

So my question is why is it a guarantee that validate() can make and not check_schema()?

@Julian
Copy link
Member

Julian commented Oct 7, 2021

Validate indeed also does not guarantee it won't raise any exception other than ValidationError when called -- as you're observing yourself here.

@robherring
Copy link
Contributor

@skamansam I implemented just what you are asking for: https://github.com/robherring/dt-schema/blob/master/dtschema/lib.py#L685

Don't use a subclass though. @Julian says that's not an ABI (and 4.0.0 broke it).

@skamansam
Copy link
Author

@robherring Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants