-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify Models to allow regex plugins to use Publisher configs #88
Comments
I think this would have to be new functionality added to the GSM itself, the ability to execute one or all of the journal_url fields on the publisher reg. form as regexes against incoming article URL-s. We've done it before in crude form with http://idfind.cottagelabs.com/ which identifies identifiers using crowdsourced regexes - what you're describing is basically a subset of that problem, identify a set of URL-s. |
True. At the moment the GSM doesn't seem to use the method that I mentioned. Not sure whether this is deliberate or not. I'm not sure that it matters, I guess the questions is one of where does that capability best sit, and how you recognise that something is a regex? |
Oh, there'd be no way to recognise a string as a regex, the person submitting it would have to tell you it is one. Normal URL-s are valid regexes too after all (they just match themselves, maybe the dots would act as wildcards). Just leaving this here for future: if this is implemented, it's probably a good idea to validate the regexes on form submission (re.compile them) and if they are not valid, point the user to the right page & section of the python docs so they at least have a chance of achieving what they want. |
This fetches all journal URL-s in all of the data.
Actually, rereading this part, I'm not sure if this is so... I thought you meant we need to allow people to submit regexes in the journal URL field, so that the system can then match incoming article URL-s against the regexes. And if one of them matches, then its config will be used. Why would we apply a regex to all the URL-s in configs? I.e. what would 1 of them matching actually mean? It's not an incoming article URL for us to say "well we should use the OUP config", it's just a bunch of other configs. |
So my understanding (possibly mistaken) was that the SpringerLink approach So the flow is: DOI -> URL So this doesn¹t work for the OUP case because we¹d need to populate the OUP For OUP the issue is that we¹re discovering additional licenses texts and There are other cases where we have both a need for regex to identify the An alternative requires a way for a plugin to determine which configs are So this would go: In this case the config URL is not used to match against the page URL but as Clear as mud? From: Emanuil Tolev [email protected]
|
The OUP plugin is getting out of hand and it would be helpful to move it from a static license based approach to one like the SpringerLink plugin that can check publisher configs for any matching URLs.
The SpringerLink plugin uses the classmethod find_by_journal_url() of models.Publisher which searches by an exact match for a URL. This needs to be done in reverse for a regex plugin. ie to identify the relevant publishers we need to ask whether any of the registered URLs for any config are matched by the regex.
The models.Publisher classmethod all_journal_urls() may be the way to do this? Or do we need to expose another class method which tests for a match to a regex?
The text was updated successfully, but these errors were encountered: