Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Accept Regex expressions in Scan #21

Open
mofeing opened this issue Nov 29, 2023 · 1 comment
Open

[Feature request] Accept Regex expressions in Scan #21

mofeing opened this issue Nov 29, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@mofeing
Copy link

mofeing commented Nov 29, 2023

Proposal

Often I find myself that some clauses are more easily parsed with a regex than with PikaParser clauses. The solution is to user a Scan in a way similar to:

rules = Dict(
    ...,
    :id => PikaParser.scan() do x
        matched = match(r"^[a-z][a-zA-Z0-9_]*", x)
        isnothing(matched) && return 0
        length(matched.match)
    end,
    ...,
)

It would be great if we could just pass the regex to scan.

Unsolved issues

Only regex of the form r"^..." should be accepted. If the ^ clause is not present, then the regex will search the pattern along all the input.

@exaexa exaexa added the enhancement New feature or request label Dec 1, 2023
@exaexa exaexa self-assigned this Dec 1, 2023
@exaexa
Copy link
Collaborator

exaexa commented Dec 1, 2023

notes for self when I get to this:

  • we MIGHT want to abuse the do notation for writing folds directly into the grammar, as with bison
  • the usual form of regex semantics is slightly inconvenient for exact matching, and we unfortunately don't have much freedom in regex implementations to support syntax similar to eg. flex. I'd suggest that for simplifying matters we do 2 helper functions, one if these regex_to (scans everything to the match, including the match) and second regex_before (scans everything to the match without the match), with an optional argument to select which match group is actually being selected. (Can be done by taking .offset and .ncodeunits from m.match or m.captures[N].)
  • The "to" and "before" variants also allow folks to implement various useful stuff like not_followed_by directly into lexing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants