set regex pattern for CURIE and add URL, fixes #400 #406

sierra-moxon · 2023-03-16T16:58:17Z

regexes are fairly permissive taking into account that CURIE syntax in the "wild" might be less conformant to prefix best practices.
the ticket called for a URL type, if we want a semantic URI type (where the URI can be the subject of a triple in RDF), then we should be much more restrictive. When we have a specific use case for this, I'd be happy to refine it.

edeutsch

okay with me in principle, but I don't think the regular expression seems helpful. Is there a "standard" regexp for URLs out there on stack overflow or somewhere?

edeutsch · 2023-03-17T04:09:18Z

TranslatorReasonerAPI.yaml

+      description: >-
+      externalDocs:
+        url: https://www.ietf.org/rfc/rfc3987.txt
+      pattern: ^(http(s)?:\/\/.)?(www\.)?\S+$


Isn't this regular expression anything without a space? Is that really helpful? Testing this regexp:

#!/bin/env python3 import re inputs = [ 'http://arax.ncats.io', 'foo', '@*$&@#', 'PMID:123', 'http://peptideatlas.org/tmp/hello world.txt' ] for input in inputs: match = re.search(r'^(http(s)?:\/\/.)?(www\.)?\S+$', input) if match: print(f"MATCHES {input}") else: print(f" x {input}")

yields

MATCHES http://arax.ncats.io MATCHES foo MATCHES @*$&@# MATCHES PMID:123 x http://peptideatlas.org/tmp/hello world.txt

only the last one fails.

Yet if you paste that into your browser, it works!

Maybe this is useful with a more restrictive regular expression?

My first thought too was to find a more comprehensive one. The one I found and tested that was more restrictive did fail the line length linting on this repo and before I tried to break it into many lines, I started asking around a bit for best practice on this. The feedback I got was that a very restrictive regex will mean constant tweaking with "in the wild" implementations of URLs and CURIEs. However, if we want to have a URI (not L) type, that is much more restrictive, we can do that.

…NCATSTranslator#405 and NCATSTranslator#406

vdancik

I think that specifying regex in specification is not necessary

sierra-moxon added 2 commits March 16, 2023 09:51

tweak regexes to fit in the line length specified by tox.

2c12281

change name of parameter to reflect ticket

2a48936

sierra-moxon changed the base branch from master to 1.4 March 16, 2023 16:58

sierra-moxon requested a review from mbrush March 16, 2023 16:58

capasfield requested review from edeutsch, andrewsu, brettasmi, edgargaticaCU, tziomics, vdancik, vemonet, webyrd, yakaboskic, gprice1129, kennethmorton, marcdubybroad, MarkDWilliams and RichardBruskiewich March 17, 2023 14:45

edeutsch reviewed Mar 17, 2023

View reviewed changes

edeutsch added a commit that referenced this pull request Mar 21, 2023

Move #393 and #403 to finished pile. Add #405 and #406

672e1c1

fix description

dff281b

uhbrar pushed a commit to uhbrar/ReasonerAPI that referenced this pull request Mar 27, 2023

Move NCATSTranslator#393 and NCATSTranslator#403 to finished pile. Add …

de9c2f9

…NCATSTranslator#405 and NCATSTranslator#406

vdancik requested changes Mar 29, 2023

View reviewed changes

edeutsch added this to the v1.5 milestone Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set regex pattern for CURIE and add URL, fixes #400 #406

set regex pattern for CURIE and add URL, fixes #400 #406

sierra-moxon commented Mar 16, 2023

edeutsch left a comment

edeutsch Mar 17, 2023

sierra-moxon Mar 17, 2023

vdancik left a comment

set regex pattern for CURIE and add URL, fixes #400 #406

Are you sure you want to change the base?

set regex pattern for CURIE and add URL, fixes #400 #406

Conversation

sierra-moxon commented Mar 16, 2023

edeutsch left a comment

Choose a reason for hiding this comment

edeutsch Mar 17, 2023

Choose a reason for hiding this comment

sierra-moxon Mar 17, 2023

Choose a reason for hiding this comment

vdancik left a comment

Choose a reason for hiding this comment