Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parser): ensure all loops advance parsing, fuzz with arbitrary bytes #828

Merged
merged 15 commits into from
Feb 13, 2024

Conversation

goto-bus-stop
Copy link
Member

@goto-bus-stop goto-bus-stop commented Feb 9, 2024

This addresses an issue where the parser could get stuck in a loop if the token limit was reached at a specific point. Most egregiously, the parser would get in an infinite loop if the token limit was reached in the middle of a fragment spread: { ... <LIMIT> fragmentName }.

To catch cases like this, this PR introduces p.peek_while(), which replaces while let Some() = p.peek() in the parser. In debug mode peek_while() asserts that parsing was advanced by the iteration. My current solution to cases that did not advance parsing is to change error reports to use p.err_and_pop(), which consumes the token that caused the error. This probably isn't always the most correct thing for error-tolerance, since the token may be a closing token or something that could be interpreted more optimally, but we're not yet diligent about that to begin with and running into an infinite loop is worse :)

p.peek_while() requires you return ControlFlow, which lets you break out of the loop when the peeked token did not match an expected token.

p.peek_while_kind() supports a simpler case where you expect one specific token kind to come up every time. If that token kind is spotted, you must parse at least that token. You can't break out of the loop early, it will end when the condition doesn't match anymore.

p.parse_separated_list() supports the various separated lists with optional prefix in the spec: & Interface, | Union, and | DirectiveLocation. Using this also makes directive location parsing no longer recursive (it was recursive but didn't have a recursion limit until now! 😱 )

Root operation type parsing is no longer recursive.

This also adds a fuzz test using completely arbitrary strings as input to the parser--all our other fuzz tests generate a document with apollo-smith, which is useful in its own way, but doesn't do a great job of finding issues in error edge cases like this. The new fuzz tests has a token limit as well. I found a few more cases that could panic with this.

  • my own non-recursive root operation type parsing 🤪
  • stray top-level string value at the token limit
  • token limit reached in the middle of parsing a type reference

p.err("expected at least one Selection in Selection Set");
// If there is no token,
None => {
p.err_and_pop("expected an Inline Fragment or a Fragment Spread");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing this from p.err() to p.err_and_pop() fixes the infinite loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a similar infinite loop could easily happen if another call to p.err() is added elsewhere in the future that should be p.err_and_pop() instead? What would be a good place to warn against this gotcha? The doc-comment for p.err() may be easy to miss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good answer for this, I will merge and file an issue

crates/apollo-parser/src/parser/mod.rs Show resolved Hide resolved
fuzz/fuzz_targets/parser_limited.rs Outdated Show resolved Hide resolved
crates/apollo-parser/src/parser/grammar/argument.rs Outdated Show resolved Hide resolved
p.err("expected at least one Selection in Selection Set");
// If there is no token,
None => {
p.err_and_pop("expected an Inline Fragment or a Fragment Spread");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a similar infinite loop could easily happen if another call to p.err() is added elsewhere in the future that should be p.err_and_pop() instead? What would be a good place to warn against this gotcha? The doc-comment for p.err() may be easy to miss.

crates/apollo-parser/src/parser/grammar/ty.rs Show resolved Hide resolved
@goto-bus-stop goto-bus-stop merged commit d545a6a into main Feb 13, 2024
12 checks passed
@goto-bus-stop goto-bus-stop deleted the renee/no-parse-loop branch February 13, 2024 15:37
@goto-bus-stop goto-bus-stop mentioned this pull request Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants