-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add options support to the compiler #112
Add options support to the compiler #112
Conversation
@@ -338,8 +338,6 @@ extension AST.Atom { | |||
switch kind { | |||
case let .escaped(b): return b.characterClass | |||
|
|||
case .any: return .any | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this means that the .any
case will be handled using a consumer function, rather than the old CharacterClass
type.
AST.MatchingOptionSet is an OptionSet representation of the AST.MatchingOption enum (I wish we had a language feature for this transformation). This also adds the capability for AST.MatchingOptionSequence to convert itself to an OptionSet, adding or removing options as needed. Include additional options for the three potential semantic levels.
Nothing's ever this simple -- `.` behavior depends on current options
This also makes a little progress in getting off the old CharacterClass / CC.MatchLevel types, and implements the correct matching behavior for `.` depending on matching level and the 's' flag.
55d3e6c
to
bf0a8f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the underlying functionality, but I want to make sure we put things in the right places.
@@ -12,7 +12,7 @@ | |||
extension AST { | |||
/// An option written in source that changes matching semantics. | |||
public struct MatchingOption: Hashable { | |||
public enum Kind { | |||
public enum Kind: Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this is Int
-backed, we can use the raw value as the bit offset for MatchingOptionSet.rawValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we'll want a model-layer enum or something that represents the compiler's semantic understanding of options rather than the parser's syntactic representation. I ask because this can be (but isn't necessarily!) a code smell that we're still conflating the syntax layer with the semantic layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is every option here something that would make sense in our DSL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several of these don't really make sense for the DSL:
extra
,extraExtended
: about literal parsing rather than actual matchingallowDuplicateGroupNames
,noAutoCapture
: the DSL will have a different way of capturingreluctantByDefault
: irrelevant (and/or terrible) for the DSL API
The rest affect how a pattern matches, but I don't know if we'll want to do them as "options" per se or as different function calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there options that make sense for the DSL but are not present in a literal? How divergent is our syntactic representation from our semantic model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the : Int
, make a model-layer enum with our option model, function to lower from AST to that model.
// Swift semantic matching level | ||
case graphemeClusterSemantics // X | ||
case unicodeScalarSemantics // u | ||
case byteSemantics // b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these existing options, or are you defining Swift-specific syntax?
We'll want to be able to lower this AST node (entirely) into a custom OptionSet-like model type using your merge semantics below.
But, we do need a semantics for each option (even if that's just unsupported("reasons")
).
@hamishknight what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah these are Swift-specific, they don't conflict with the matching options from other engines tho AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep these well-delineated in that case, and be very clear that this is something we're formally proposing as an addition to existing regex syntax. Should we prefix the names with swift
to be clearer?
We should also make a note or sub-section in #63 for these kinds of syntactic additions. They're otherwise easy to lose track of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to #63.
|
||
public var isSemanticMatchingLevel: Bool { | ||
switch kind { | ||
case .graphemeClusterSemantics, .unicodeScalarSemantics, .byteSemantics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do other engines have ways of switching semantic models between byte and scalar? How should we lower those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only existing engine I've found that has this semantic switching is Rust's regex::bytes
, which lets you switch from the default byte-matching mode into a Unicode mode with (?u)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does .textSegmentGraphemeMode
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Oniguruma, .textSegmentGraphemeMode
and .textSegmentWordMode
are spelled y{g}
and y{w}
, and change the behavior of \y
and \Y
between marking grapheme boundaries and marking word boundaries (as defined by https://unicode.org/reports/tr29/). That in turn changes the behavior of \X
, since it consumes up to the next \y
boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably want a DSL equivalent for everything we support here. Have you thought about that at all, something like a TextSegmentBoundary
which will read in-coming context to determine what to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we'll need some way to configure DSL-defined patterns, for these, case insensitivity, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts there? This seems relevant to whether we should be using an enum in the AST as the same type/representation for our model.
case "X": return advanceAndReturn(.graphemeClusterSemantics) | ||
case "u": return advanceAndReturn(.unicodeScalarSemantics) | ||
case "b": return advanceAndReturn(.byteSemantics) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this novel syntax we're talking about adding, or does this exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would be Swift-specific flags. The X
flag is to match the \X
metacharacter, which matches a grapheme cluster; given that, it might be better to go with O
for Unicode scalar semantics to match \O
. I go through in the flags write-up.
@@ -338,8 +338,6 @@ extension AST.Atom { | |||
switch kind { | |||
case let .escaped(b): return b.characterClass | |||
|
|||
case .any: return .any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happening to CharacterClass.any
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable line for someone to add. Could you explicitly return nil with a comment explaining why we don't want to produce a character class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(outside the scope of this PR)
Or for that matter, should we remove or redesign CharacteClass
? It originally tried to fit 3 rather distinct purposes, and it still awkwardly conflates the latter 2:
- Syntactic-information carrying types for the AST, for source tools and compilation consumption
- Model types to assist in compilation and/or run-time call outs for the matching engine
- API types appearing as public API from the stdlib.
We got it off the AST a while back, but we still have this as being public
in the _StringProcessing
module.
We definitely will want some type that people can use as namespaces to look up built-in sets, and a type for registering their own via closures (either Character
predicates or Character
or Scalar
consumer closures), etc. We could over time evolve this to have set-algebraic operations on it, etc.
We will separately want a model-layer type to assist in compilation. That might be the same type if it happens to align, but I definitely don't want to try to burden an API type with that need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable line for someone to add. Could you explicitly return nil with a comment explaining why we don't want to produce a character class?
thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note about where .any
is being handled.
) throws -> Program<String>.ConsumeFunction? { | ||
// TODO: Wean ourselves off of this type... | ||
if let cc = self.characterClass?.withMatchLevel(opts) { | ||
let matchLevel: CharacterClass.MatchLevel = opts.contains(.unicodeScalarSemantics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to be calling richer semantic API here, so that changes to exactly what combinations of options mean can be abstracted and in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think we need to get rid of CharacterClass.MatchLevel
altogether. I'll handle that in a different change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a set-membership check of a syntactic enum feels a bit like a smell. What if there are multiple ways to spell it, or it came in via API?
Should this instead be a isScalarSemantic: Bool
on the regex options model type, or perhaps this match level enum is vended by that type?
return opts.contains(.graphemeClusterSemantics) | ||
? input.index(after: curIndex) | ||
: input.unicodeScalars.index(after: curIndex) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely something we're going to want to represent in the matching engine directly.
When a group sets options with (?^abc), the options should reset before the new options are applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot better after the refactoring, thanks!
I think we should make a more semantically rich type for the compiler. E.g. a type MatchingOptions
which can give a semantic-level enum, a line-handling-strategy enum, assist in choosing what character class definition to use, etc.
It could have a mutating func beginScope(AST.Options)
, mutating func endScope()
, and mutating func override/merge(AST.Options)
. This model might naturally support hierarchical options from API too. (that is, API would be implemented in terms of this, not that this would necessarily be the API).
We probably want to get as far away from thinking in terms of AST options as we can at this level, other than how to lower them into this representation. There are other options than just those in a literal and other ways to spell or specify options (e.g. http://pcre.org/current/doc/html/pcre2unicode.html). Some options have their own state machine governing them, some don't, etc.
@@ -599,6 +599,10 @@ extension Source { | |||
if opt.isTextSegmentMode { | |||
throw ParseError.cannotRemoveTextSegmentOptions | |||
} | |||
// Matching semantics options can only be added, not removed. | |||
if opt.isSemanticMatchingLevel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is text segment option and how is it related to these? What should we do with them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're pretty much orthogonal — text-segment mode determines how \y
and \Y
behave, the semantic matching level determines what string view .
and the character classes operate on.
@@ -12,7 +12,7 @@ | |||
extension AST { | |||
/// An option written in source that changes matching semantics. | |||
public struct MatchingOption: Hashable { | |||
public enum Kind { | |||
public enum Kind: Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we'll want a model-layer enum or something that represents the compiler's semantic understanding of options rather than the parser's syntactic representation. I ask because this can be (but isn't necessarily!) a code smell that we're still conflating the syntax layer with the semantic layer.
// Swift semantic matching level | ||
case graphemeClusterSemantics // X | ||
case unicodeScalarSemantics // u | ||
case byteSemantics // b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep these well-delineated in that case, and be very clear that this is something we're formally proposing as an addition to existing regex syntax. Should we prefix the names with swift
to be clearer?
We should also make a note or sub-section in #63 for these kinds of syntactic additions. They're otherwise easy to lose track of.
|
||
public var isSemanticMatchingLevel: Bool { | ||
switch kind { | ||
case .graphemeClusterSemantics, .unicodeScalarSemantics, .byteSemantics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does .textSegmentGraphemeMode
do?
@@ -90,6 +91,8 @@ extension ParseError: CustomStringConvertible { | |||
return "expected group specifier" | |||
case .cannotRemoveTextSegmentOptions: | |||
return "text segment mode cannot be unset, only changed" | |||
case .cannotRemoveSemanticsOptions: | |||
return "matching semantics cannot be unset, only changed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we will probably want a better name than "matching semantics". This is more so specifying which model of String we will run the declared algorithm over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a syntactic error, maybe collapsing both of these errors into one that includes the incorrectly-unset flag would be better, anyway.
@@ -545,7 +554,7 @@ extension Unicode.POSIXProperty { | |||
extension Unicode.ExtendedGeneralCategory { | |||
// FIXME: Semantic level | |||
func generateConsumer( | |||
_ opts: CharacterClass.MatchLevel | |||
_ opts: MatchingOptionSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any practical difference between the type that does stack logic and the this type?
} | ||
|
||
fileprivate mutating func insert(_ kind: Self) { | ||
self.rawValue |= kind.rawValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertions?
|
||
extension MatchingOptionSet { | ||
init(_ kind: AST.MatchingOption.Kind) { | ||
self.rawValue = 1 << kind.rawValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.... are we certain that syntactic options correspond 1-1 to semantic concerns?
} | ||
|
||
var top: MatchingOptionSet { | ||
_invariantCheck() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, it's better to call _invariantCheck()
at the end of every init, usually by having every init relegate to a common raw-value init, i.e. we're laying our initializations to always check our assumptions.
Then, we only have to make sure that mutating methods, which don't boil down to new .init
calls, check that these invariants are maintained at the end.
I.e. ideally we'd structure code such that we'd never need to call it from a getter, unless there's some way for mutations to occur outside the domain of its interfaces.
} | ||
|
||
/// A never-empty stack of `MatchingOptionSet`s. | ||
struct MatchingOptionSetStack { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the compilers perspective, why are there two separate types? It could be that we can further abstract/encapsulate both the concept of what the current options say we should emit and the concept of scopes and option switching.
This converts MatchingOptionSet & ...Stack to a single interface that is more semantically oriented toward the compiler.
@milseman Thanks for the feedback! I've moved to a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking a lot better! I have a couple questions or clarifications then this looks merge-ready
|
||
public var isSemanticMatchingLevel: Bool { | ||
switch kind { | ||
case .graphemeClusterSemantics, .unicodeScalarSemantics, .byteSemantics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably want a DSL equivalent for everything we support here. Have you thought about that at all, something like a TextSegmentBoundary
which will read in-coming context to determine what to do?
@@ -338,8 +338,6 @@ extension AST.Atom { | |||
switch kind { | |||
case let .escaped(b): return b.characterClass | |||
|
|||
case .any: return .any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable line for someone to add. Could you explicitly return nil with a comment explaining why we don't want to produce a character class?
thoughts?
@@ -12,7 +12,7 @@ | |||
extension AST { | |||
/// An option written in source that changes matching semantics. | |||
public struct MatchingOption: Hashable { | |||
public enum Kind { | |||
public enum Kind: Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is every option here something that would make sense in our DSL?
@@ -110,6 +106,10 @@ class Compiler { | |||
try emit(g.child) | |||
builder.buildEndCapture(cap) | |||
|
|||
case .changeMatchingOptions(let optionSequence, _): | |||
options.replaceCurrent(optionSequence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a replacement, or a merging-in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can actually be either one, depending on what optionSequence
holds… I've changed both replaceCurrent
and the merge
method to apply(sequence)
.
: .eager | ||
} else { | ||
kind = quant.kind.value | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more broadly useful for this to be a method on the options? Something like remap(_ kind: AST.Quantification.Kind) -> AST.Quantification.Kind
?
I'm also thinking that we might want to pull some of these concepts out of the AST, especially if we want to surface them through the DSL. My DSLTree prototypes have these inside them.
} | ||
for opt in sequence.removing { | ||
remove(.init(opt.kind)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hamishknight can you double check the logic and make sure this does what you expect it to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, though I actually realized that the x
and xx
options are also mutually exclusive, which we'll likely want to handle here (e.g unsetting x
unsets xx
). Though I'm happy to take care of that when implementing the parsing for PCRE extended syntax as we'll need some way of tracking the x
and xx
options in the parser for that. Perhaps the parser could use this semantic model to do that, or else we maybe could implement a simpler version of it just for the x
and xx
options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x
and xx
are necessarily irrelevant by this stage, right? They affect parsing, but don't have any continued impact during compilation. Do we want to allow them in this grouped/intra-literal position, or should they only be permitted as "global" options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah they're only relevant to the parser, my thinking was that if we wanted the parser to also be able to use this semantic model to track which of the syntax options is enabled in a given scope, then we could handle them here, or if the parser implements its own logic for that, we could disregard them here and just have this be specific to the matching engine.
Do we want to allow them in this grouped/intra-literal position, or should they only be permitted as "global" options?
As I understand it, they can be set and unset arbitrarily throughout the regex, e.g (?x)a b c(?-x)d e f
, only the whitespace in a b c
is treated as non-semantic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't realize! I've added support for their exclusivity here, too, and can implement whitespace discarding in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
can implement whitespace discarding in a follow-up
Shouldn't this be a case of the parser producing Trivia
nodes, and the matching engine disregarding them? FWIW I have a patch that implements that a83ca4d. I'm not sure it could be done completely in the matching engine, as PCRE has some funky rules that means e.g x * +
is a quantification under extended syntax (my patch doesn't yet implement it tho).
|
||
mutating func replaceCurrent(_ sequence: AST.MatchingOptionSequence) { | ||
stack[stack.count - 1].merge(with: sequence) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe call this merge
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone with apply
for both, since it's sometimes "merge" and sometimes "replace".
} | ||
} | ||
|
||
// Deprecated CharacterClass.MatchLevel API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who calls these?
fileprivate init(unchecked kinds: AST.MatchingOption.Kind...) { | ||
self.rawValue = 0 | ||
for kind in kinds { | ||
self.rawValue |= 1 << kind.rawValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if these are mutually exclusive? Can we at least assert, if not merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the exclusivity invariants into MatchingOptions
— the Representation
option set is just storage.
# Conflicts: # Sources/_StringProcessing/ConsumerInterface.swift
|
||
public var isSemanticMatchingLevel: Bool { | ||
switch kind { | ||
case .graphemeClusterSemantics, .unicodeScalarSemantics, .byteSemantics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts there? This seems relevant to whether we should be using an enum in the AST as the same type/representation for our model.
@@ -12,7 +12,7 @@ | |||
extension AST { | |||
/// An option written in source that changes matching semantics. | |||
public struct MatchingOption: Hashable { | |||
public enum Kind { | |||
public enum Kind: Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there options that make sense for the DSL but are not present in a literal? How divergent is our syntactic representation from our semantic model?
case .property: | ||
// TODO: Would our model type for character classes include | ||
// this? Or does grapheme-semantic mode complicate that? | ||
return nil | ||
|
||
case .any: | ||
// 'any' is handled by Compiler.emitAny(), not `CharacterClass` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the current options, .any
can be just a simple advance instruction that doesn't require a call to a consumer closure, so it's better to handle it a level up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a fatal error then? Unfortunately this is trying to make a CharacterClass, and there's an any
class, so it's weird to just return nil
from this.
/// stack-based scoping. | ||
struct MatchingOptions { | ||
/// A set of matching options. | ||
fileprivate struct Representation: OptionSet, RawRepresentable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this in an extension?
@@ -12,7 +12,7 @@ | |||
extension AST { | |||
/// An option written in source that changes matching semantics. | |||
public struct MatchingOption: Hashable { | |||
public enum Kind { | |||
public enum Kind: Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the : Int
, make a model-layer enum with our option model, function to lower from AST to that model.
@swift-ci Please test |
Co-authored-by: Michael Ilseman <[email protected]>
@swift-ci please test linux platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
MatchingOptions provides an interface for the compiler to manage group-scoped matching options, to apply matching option sequences from the AST, and to query when building out matching behavior. Includes support and tests for the `s` and `u` option flags.
This adds support for tracking options to the Regex compiler. Most options still don't have any effect, but I've added support for the
s
flag (single-line) and limited support foru
(Unicode scalar semantics).Note: This is based on top of #102.