Implement semantic diagnostics #379

hamishknight · 2022-05-05T12:23:18Z

Start emitting errors for unsupported constructs, and other semantic errors such as duplicate group names.

Once we start emitting bytecode for regex at compile time, these errors could potentially be subsumed into the bytecode generator. But for now, implement them as a separate pass.

Resolves #357
Resolves #264
Resolves #116
Resolves #312

milseman

LGTM, unless there's a break between compiler/library interfaces. We really need to keep those stable and migrate incrementally.

milseman · 2022-05-09T16:04:35Z

Sources/_RegexParser/Regex/Parse/Diagnostics.swift

+  case invalidReference(Int)
+  case duplicateNamedCapture(String)
+  case invalidCharacterClassRangeOperand
+  case invalidQuantifierRange(Int, Int)


Same enum or separate enum? (I haven't thought about it)

I initially added them as a separate enum, but it seemed cleaner to do it this way as they share all the same logic as other parser errors for e.g printing and catch block handling. This for example means they can use the same testing logic as other parsing tests. We could split them out in the future, but for now at least I think this is the simplest way to go.

Sources/_RegexParser/Regex/Parse/Parse.swift

This allows specifying whether or not to perform semantic checks on the AST. Some clients, e.g syntax coloring, only care about the syntactic structure. But other clients want errors to be emitted for e.g unsupported constructs.

Start emitting errors for unsupported constructs, and other semantic errors such as duplicate group names. Once we start emitting bytecode for regex at compile time, these errors could potentially be subsumed into the bytecode generator. But for now, implement them as a separate pass.

Begin storing source location on capture lists, and start erroring on duplicate named captures.

hamishknight · 2022-05-09T16:17:26Z

@swift-ci please test

Sources/_RegexParser/Regex/Parse/Sema.swift

natecook1000 · 2022-05-09T18:39:20Z

Sources/_RegexParser/Regex/Parse/Sema.swift

+  }
+
+  func validateQuantification(_ quant: AST.Quantification) throws {
+    try validateNode(quant.child)


Is it possible to validate that the child isn't a zero-width assertion here? e.g. we want to reject \b+. Tracked in #312.

Implemented logic to check for escape sequences that aren't quantifiable, how does it look?

natecook1000 · 2022-05-09T18:42:44Z

Sources/_RegexParser/Regex/Parse/Sema.swift

+        .extendedPictographic, .graphemeLink, .hyphen, .otherAlphabetic,
+        .otherDefaultIgnorableCodePoint, .otherGraphemeExtended,
+        .otherIDContinue, .otherIDStart, .otherLowercase, .otherMath,
+        .otherUppercase, .prependedConcatenationMark:


For these .other* properties, we don't implement b/c they're included in e.g. \p{isAlphabetic}. Do you think we should redirect people to the corresponding properties?

Makes sense, and I can take care of that in a follow-up

natecook1000 · 2022-05-09T18:43:24Z

Sources/_RegexParser/Regex/Parse/Sema.swift

+    case .resetStartOfMatch, .singleDataUnit, .horizontalWhitespace,
+        .notHorizontalWhitespace, .verticalTab, .notVerticalTab,
+        // '\N' needs to be emitted using 'emitAny'.
+        .notNewline:


Still to implement, but I'll fix up the validation then.

- Make `\h` and `\H` supported for now - Check character class ranges - Diagnose unquantifiable escape sequences

hamishknight · 2022-05-09T19:45:55Z

@swift-ci please test

natecook1000

LGTM!

hamishknight requested review from milseman and natecook1000 May 5, 2022 12:23

hamishknight mentioned this pull request May 5, 2022

AST post-processing #116

Closed

hamishknight force-pushed the sema branch 2 times, most recently from 0df23fa to 9f4a821 Compare May 9, 2022 14:35

milseman approved these changes May 9, 2022

View reviewed changes

hamishknight added 3 commits May 9, 2022 17:15

Introduce ASTStage parameter to parse

9740416

This allows specifying whether or not to perform semantic checks on the AST. Some clients, e.g syntax coloring, only care about the syntactic structure. But other clients want errors to be emitted for e.g unsupported constructs.

Validate capture lists

466b375

Begin storing source location on capture lists, and start erroring on duplicate named captures.

hamishknight force-pushed the sema branch from 9f4a821 to 466b375 Compare May 9, 2022 16:17

hamishknight commented May 9, 2022

View reviewed changes

Sources/_RegexParser/Regex/Parse/Sema.swift Outdated Show resolved Hide resolved

natecook1000 reviewed May 9, 2022

View reviewed changes

Address review feedback

c95e862

- Make `\h` and `\H` supported for now - Check character class ranges - Diagnose unquantifiable escape sequences

natecook1000 approved these changes May 9, 2022

View reviewed changes

hamishknight merged commit 7f068dc into swiftlang:main May 9, 2022

hamishknight deleted the sema branch May 9, 2022 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement semantic diagnostics #379

Implement semantic diagnostics #379

hamishknight commented May 5, 2022 •

edited

Loading

milseman left a comment

milseman May 9, 2022

hamishknight May 9, 2022

hamishknight commented May 9, 2022

natecook1000 May 9, 2022

hamishknight May 9, 2022

natecook1000 May 9, 2022

hamishknight May 9, 2022

natecook1000 May 9, 2022

hamishknight commented May 9, 2022

natecook1000 left a comment

Implement semantic diagnostics #379

Implement semantic diagnostics #379

Conversation

hamishknight commented May 5, 2022 • edited Loading

milseman left a comment

Choose a reason for hiding this comment

milseman May 9, 2022

Choose a reason for hiding this comment

hamishknight May 9, 2022

Choose a reason for hiding this comment

hamishknight commented May 9, 2022

natecook1000 May 9, 2022

Choose a reason for hiding this comment

hamishknight May 9, 2022

Choose a reason for hiding this comment

natecook1000 May 9, 2022

Choose a reason for hiding this comment

hamishknight May 9, 2022

Choose a reason for hiding this comment

natecook1000 May 9, 2022

Choose a reason for hiding this comment

hamishknight commented May 9, 2022

natecook1000 left a comment

Choose a reason for hiding this comment

hamishknight commented May 5, 2022 •

edited

Loading