-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax Status and Roadmap #63
Comments
Lots of source location tracking in #67. after that I'll probably start focusing more on other areas of the project. |
Escaped backreferences including maybe-octal-sequences in #88 |
Option parsing in #91 |
Swift-specific options for switching between matching semantic levels: #112 |
Conditional patterns in #113 |
PCRE callouts, backtracking directives, and .NET balanced captures in #117 |
PCRE global options and Oniguruma recursion levels in #123 After that, it's just the extended syntax, and Oniguruma callouts and absent functions. |
Remaining Oniguruma-specific syntax in #129 |
Extended syntax in #136 |
Another one is Unicode scalar sequences ala https://unicode.org/reports/tr18/#RL1.1
|
@hamishknight can you go over this and see what needs to be tracked as an issue for this release and what needs to go into #370? |
We've completed the syntax feature work here, future syntax work is being tracked by #370 |
For the regex literal syntax, we're looking at supporting a syntactic superset of:
PCRE2, an "industry standard" of sorts, and a rough superset of Perl, Python, etc.
Oniguruma, an internationalization-oriented engine with some modern features
ICU, used by NSRegularExpression, a Unicode-focused engine
Our interpretation of UTS#18's guidance, which is about semantics, but we can infer syntactic feature sets.
TODO: .NET, which has delimiter-balancing and some interesting minor details on conditional patterns
These aren't all strictly compatible (e.g. a set operator in PCRE2 would just be a redundant statement of a set member). We can explore adding strict compatibility modes, but in general the syntactic superset is fairly straight-forward.
Status
The below are (roughly) implemented. There may be bugs, but we have some support and some testing coverage:
a|b
(x)
,(?:x)
,(?<name>x)
\n
,\a
\u{...}
,\x{...}
,\uHHHH
.
,\d
,\w
,\s
[...]
, including binary operators&&
,~~
,--
x?
,x+
,x*
,x{n,m}
\b
,^
,$
\Q ... \E
(?#comment)
\p{...}
,[:...:]
\N{...}
,\N{U+hh}
(?=)
,(?!)
,(*pla:)
,(?*...)
,(?<*...)
,(napla:...)
(*script_run:...)
,(*sr:...)
,(*atomic_script_run:...)
,(*asr:...)
\ddd
,\o{...}
\1
,\g2
,\g{2}
,\k<name>
,\k'name'
,\g{name}
,\k{name}
,(?P=name)
(?m)
,(?-i)
,(?:si)
,(?^m)
\g<n>
,\g'n'
,(?R)
,(?1)
,(?&name)
,(?P>name)
(?(R)...)
,(?(n)...)
,(?(<n>)...)
,(?('n')...)
,(?(condition)then|else)
(?C2)
,(?C"text")
(*ACCEPT)
,(*SKIP:NAME)
(?<name1-name2>...)
\k<n+level>
,(?(n+level))
(?{...})
,(*name)
(?{...})
has in-line code in it, we could consider the same (for now, we just parse an arbitrary string)(?~absent)
(*LIMIT_MATCH=d)
,(*LF)
(?x)
/(?xx)
syntax allowing for non-semantic whitespace and end-of-line commentsabc # comment
Experimental syntax
Additionally, we have (even more experimental) support for some syntactic conveniences, if specified. Note that each of these (except perhaps ranges) may introduce a syntactic incompatibility with existing traditional-syntax regexes. Thus, they are mostly illustrative, showing what happens and where we go as we slide down this "slippery slope".
/a b c/ === /abc/
/"a.b"/ === /\Qa.b\E/
/a{2..<10} b{...3}/ === /a{2,9}b{0,3}/
/a (_: b) c/ === /a(?:b)c/
TBD:
/a (name: b) c/ === /a(?<name>b)c/
/* comment */ or
// commentinstead of
(?#. comment)`// comment
Swift's syntactic additions
X
: grapheme cluster semanticsO
: Unicode scalar semanticsb
: byte semanticsSource location tracking
Implemented:
|
in alternation-
in[a-f]
TBD:
Integration with the Swift compiler
Initial parser support landed in swiftlang/swift#40595, using the delimiters
'/.../'
, which are lexed in-package.The text was updated successfully, but these errors were encountered: