Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement semantic diagnostics #379

Merged
merged 4 commits into from
May 9, 2022
Merged

Conversation

hamishknight
Copy link
Contributor

@hamishknight hamishknight commented May 5, 2022

Start emitting errors for unsupported constructs, and other semantic errors such as duplicate group names.

Once we start emitting bytecode for regex at compile time, these errors could potentially be subsumed into the bytecode generator. But for now, implement them as a separate pass.

Resolves #357
Resolves #264
Resolves #116
Resolves #312

@hamishknight hamishknight mentioned this pull request May 5, 2022
@hamishknight hamishknight force-pushed the sema branch 2 times, most recently from 0df23fa to 9f4a821 Compare May 9, 2022 14:35
Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, unless there's a break between compiler/library interfaces. We really need to keep those stable and migrate incrementally.

case invalidReference(Int)
case duplicateNamedCapture(String)
case invalidCharacterClassRangeOperand
case invalidQuantifierRange(Int, Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same enum or separate enum? (I haven't thought about it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added them as a separate enum, but it seemed cleaner to do it this way as they share all the same logic as other parser errors for e.g printing and catch block handling. This for example means they can use the same testing logic as other parsing tests. We could split them out in the future, but for now at least I think this is the simplest way to go.

Sources/_RegexParser/Regex/Parse/Parse.swift Outdated Show resolved Hide resolved
This allows specifying whether or not to perform
semantic checks on the AST. Some clients, e.g
syntax coloring, only care about the syntactic
structure. But other clients want errors to be
emitted for e.g unsupported constructs.
Start emitting errors for unsupported constructs,
and other semantic errors such as duplicate group
names.

Once we start emitting bytecode for regex at
compile time, these errors could potentially be
subsumed into the bytecode generator. But for now,
implement them as a separate pass.
Begin storing source location on capture lists,
and start erroring on duplicate named captures.
@hamishknight
Copy link
Contributor Author

@swift-ci please test

Sources/_RegexParser/Regex/Parse/Sema.swift Outdated Show resolved Hide resolved
Sources/_RegexParser/Regex/Parse/Sema.swift Outdated Show resolved Hide resolved
Sources/_RegexParser/Regex/Parse/Sema.swift Outdated Show resolved Hide resolved
}

func validateQuantification(_ quant: AST.Quantification) throws {
try validateNode(quant.child)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to validate that the child isn't a zero-width assertion here? e.g. we want to reject \b+. Tracked in #312.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented logic to check for escape sequences that aren't quantifiable, how does it look?

Comment on lines +154 to +157
.extendedPictographic, .graphemeLink, .hyphen, .otherAlphabetic,
.otherDefaultIgnorableCodePoint, .otherGraphemeExtended,
.otherIDContinue, .otherIDStart, .otherLowercase, .otherMath,
.otherUppercase, .prependedConcatenationMark:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these .other* properties, we don't implement b/c they're included in e.g. \p{isAlphabetic}. Do you think we should redirect people to the corresponding properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, and I can take care of that in a follow-up

case .resetStartOfMatch, .singleDataUnit, .horizontalWhitespace,
.notHorizontalWhitespace, .verticalTab, .notVerticalTab,
// '\N' needs to be emitted using 'emitAny'.
.notNewline:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still to implement, but I'll fix up the validation then.

- Make `\h` and `\H` supported for now
- Check character class ranges
- Diagnose unquantifiable escape sequences
@hamishknight
Copy link
Contributor Author

@swift-ci please test

Copy link
Member

@natecook1000 natecook1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hamishknight hamishknight merged commit 7f068dc into swiftlang:main May 9, 2022
@hamishknight hamishknight deleted the sema branch May 9, 2022 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants