Allow per item validators to be specified by string #419

rvandam · 2023-09-13T20:41:42Z

We ran into a problem where another extension required our settings to be json serializable which was barfing on the class names used for specifying json schemas. I'm separately working on a fix for that extension but since under the hood the classes are just getting converted to obj.__name__ anyway I wondered why I couldn't just specify them that way to begin with.

rvandam · 2023-09-14T22:15:08Z

This latest version should pass all tests, including the new one, at least on 3.9-3.11 (I don't have 3.8 installed to test but it should be fine).

codecov · 2023-09-15T07:29:02Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (6edb037) 79.39% compared to head (a75e6fb) 79.39%.

❗ Current head a75e6fb differs from pull request most recent head 5acd876. Consider uploading reports for the commit 5acd876 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #419   +/-   ##
=======================================
  Coverage   79.39%   79.39%           
=======================================
  Files          76       76           
  Lines        3222     3222           
  Branches      534      534           
=======================================
  Hits         2558     2558           
  Misses        593      593           
  Partials       71       71

Files Changed	Coverage Δ
spidermon/contrib/scrapy/pipelines.py	`97.80% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/source/item-validation.rst

Gallaecio · 2023-09-15T07:49:44Z

docs/source/item-validation.rst

+        'DummyItem': '/path/to/dummyitem_schema.json',
+        'OtherItem': '/path/to/otheritem_schema.json',


Hmm… I don’t like that this is done by name instead of import path. Would you be OK with switching approaches? It is how Scrapy settings handle such situations (in fact, Scrapy settings first supported import paths as strings, and later also added support for actual objects).

Gallaecio · 2023-09-15T07:54:54Z

spidermon/contrib/scrapy/pipelines.py

@@ -56,7 +56,7 @@ def set_validators(loader, schema):
            if type(schema) in (list, tuple):
                schema = {Item: schema}
            for obj, paths in schema.items():
-                key = obj.__name__
+                key = obj.__name__ if hasattr(obj, "__name__") else str(obj)
                paths = paths if type(paths) in (list, tuple) else [paths]
                objects = [loader(v) for v in paths]
                validators[key].extend(objects)


Hmm…

So, I am not very familiar with Spidermon code, but it feels wrong that the pre-existing code was only taking into account obj.__name__. Sounds like different item types with the same name but imported from different modules would be validated with the same schema.

I find the pre-existing code a bit hard to read, though, so I might be misreading.

Your read is correct. The find_validators method at line 126 only uses item.__class__.__name__ to select a validator out of the dict so same name different module still gets the same schema. This is why I just went with simple class names in the string alternative because they were already being stored that way internally so the change was small.

I like the idea of using fully qualified import paths as keys but it would require some careful thought to be backwards compatible (unless we want a breaking change?). Internally we'd have to store validators using the import path but then during lookup if we fail to find a given item class we could loop over keys and check just the class name part.

Actually it might be cleaner to just store two keys internally, fully qualified and class name and then look them up in that order. No more looping over strings and splitting them.

I came back to look at this again finally and this is a bigger lift than I have time to work on right now, especially being less familiar with this code myself. Besides requiring changing the existing internal representation and all the related tests, it would likely be considered a breaking change given that someone could be inadvertently relying on the (admittedly bad) behavior of mapping only by the unqualified class name and would need some careful documentation.

Can I request that you either accept this PR as is since it's consistent with the existing code or reject it and turn @Gallaecio's request into it's own issue?

What about we keep the current behavior, but we require strings to be complete import parts, we load them with load_object, and then use their name?

obj = load_object(obj) key = obj.__name__

It’s backward-compatible, and while not fixing the “comparison by name” issue now, it does allow us to fix it in the future if we want without requiring user code to change, which would not be possible if we started allowing class names as strings now.

That seems like it would work well as a compromise keeping existing behavior intact but solving for the problem I was originally dealing with.

Co-authored-by: Adrián Chaves <[email protected]>

Allow per item validators to be specified by string

6c1a7d8

rvandam changed the title ~~Master~~ Allow per item validators to be specified by string Sep 13, 2023

VMRuiz requested a review from Gallaecio September 14, 2023 06:40

Safer test for class vs string

a75e6fb

Gallaecio reviewed Sep 15, 2023

View reviewed changes

Update docs/source/item-validation.rst

5acd876

Co-authored-by: Adrián Chaves <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow per item validators to be specified by string #419

Allow per item validators to be specified by string #419

rvandam commented Sep 13, 2023 •

edited

Loading

rvandam commented Sep 14, 2023

codecov bot commented Sep 15, 2023 •

edited

Loading

Gallaecio Sep 15, 2023 •

edited

Loading

Gallaecio Sep 15, 2023

rvandam Sep 15, 2023

rvandam Sep 15, 2023

rvandam Nov 16, 2023

Gallaecio Nov 17, 2023 •

edited

Loading

rvandam Nov 17, 2023

		'DummyItem': '/path/to/dummyitem_schema.json',
		'OtherItem': '/path/to/otheritem_schema.json',

Allow per item validators to be specified by string #419

Are you sure you want to change the base?

Allow per item validators to be specified by string #419

Conversation

rvandam commented Sep 13, 2023 • edited Loading

rvandam commented Sep 14, 2023

codecov bot commented Sep 15, 2023 • edited Loading

Codecov Report

Gallaecio Sep 15, 2023 • edited Loading

Choose a reason for hiding this comment

Gallaecio Sep 15, 2023

Choose a reason for hiding this comment

rvandam Sep 15, 2023

Choose a reason for hiding this comment

rvandam Sep 15, 2023

Choose a reason for hiding this comment

rvandam Nov 16, 2023

Choose a reason for hiding this comment

Gallaecio Nov 17, 2023 • edited Loading

Choose a reason for hiding this comment

rvandam Nov 17, 2023

Choose a reason for hiding this comment

rvandam commented Sep 13, 2023 •

edited

Loading

codecov bot commented Sep 15, 2023 •

edited

Loading

Gallaecio Sep 15, 2023 •

edited

Loading

Gallaecio Nov 17, 2023 •

edited

Loading