Parsing C-style identifiers #711
Unanswered
maartenflippo
asked this question in
Q&A
Replies: 1 comment
-
The usual approach is to change your definitions slightly:
Of course, not all numbers necessarily correspond to a semantically valid number, but that's usually something caught by a later pass. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to write a lexer for a C-style language, and I came across the following (to me) unexpected behavior. I am using version
1.0.0-alpha.7
.I have the following setup (simplified):
Let's say we are trying to parse the "1foo" as a token. The
ident
parser will fail, as expected, because identifiers cannot start with numbers. However,constant
succeeds in parsing the "1", and then theident
parser succeeds on parsing "foo". It means that the input "1foo" leads to two tokens:[Constant("1"), Ident("foo")]
.Ideally, I would expect the
token
parser to fail. If I were to use regex, I would use the word boundary as part of the pattern, leading to my desired behavior. My question is: Is there an 'idiomatic' way to achieve what I want? One possible solution is to change theident
parser to accept inputs starting with numbers, and then use.try_map
to reject those identifiers starting with a number. It works, but it feels a bit like a hack.In my search for a solution, I found that the
nano_rust
example has the exact same behavior. The lexer succeeds, and it is left to the parsing stage to identify the problem.Beta Was this translation helpful? Give feedback.
All reactions