Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support script property in regexes #62

Open
nikkiwd opened this issue Oct 30, 2023 · 1 comment
Open

Support script property in regexes #62

nikkiwd opened this issue Oct 30, 2023 · 1 comment

Comments

@nikkiwd
Copy link

nikkiwd commented Oct 30, 2023

It would be useful to support script properties in regexes, as described at http://www.unicode.org/reports/tr18/tr18-19.html#Script_Property

According to google/re2#234, these should work if RE2 is built against ICU.

Example:

Perl supports:

  • \p{Hira}
  • \p{sc=Hira}
  • \p{scx=Hira}
  • \p{Hiragana}
  • \p{sc=Hiragana}
  • \p{scx=Hiragana}

Of these, the only one which works in QLever is \p{Hiragana} (https://qlever.cs.uni-freiburg.de/wikidata/5Zj8W6)
Any of the others gives an error like Invalid SPARQL query: The regex "\p{sc=Hiragana}" is not supported by QLever (which uses Google's RE2 library). Error from RE2 is: invalid character class range: \p{sc=Hiragana}

In Wikidata's query service, the only ones which are supported are \p{sc=Hira} and \p{sc=Hiragana} (https://w.wiki/7xjr), so supporting those two in particular would make it easier to write queries which work in both places.

@hannahbast
Copy link
Member

@nikkiwd Interesting. This is an issue for https://github.com/ad-freiburg/qlever and not for the QLever UI. Any idea how to fix this? We don't do anything special to inhibit this and we weren't aware of this feature so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants