fix(parser): ensure all loops advance parsing, fuzz with arbitrary bytes #828

goto-bus-stop · 2024-02-09T11:03:07Z

This addresses an issue where the parser could get stuck in a loop if the token limit was reached at a specific point. Most egregiously, the parser would get in an infinite loop if the token limit was reached in the middle of a fragment spread: { ... <LIMIT> fragmentName }.

To catch cases like this, this PR introduces p.peek_while(), which replaces while let Some() = p.peek() in the parser. In debug mode peek_while() asserts that parsing was advanced by the iteration. My current solution to cases that did not advance parsing is to change error reports to use p.err_and_pop(), which consumes the token that caused the error. This probably isn't always the most correct thing for error-tolerance, since the token may be a closing token or something that could be interpreted more optimally, but we're not yet diligent about that to begin with and running into an infinite loop is worse :)

p.peek_while() requires you return ControlFlow, which lets you break out of the loop when the peeked token did not match an expected token.

p.peek_while_kind() supports a simpler case where you expect one specific token kind to come up every time. If that token kind is spotted, you must parse at least that token. You can't break out of the loop early, it will end when the condition doesn't match anymore.

p.parse_separated_list() supports the various separated lists with optional prefix in the spec: & Interface, | Union, and | DirectiveLocation. Using this also makes directive location parsing no longer recursive (it was recursive but didn't have a recursion limit until now! 😱 )

Root operation type parsing is no longer recursive.

This also adds a fuzz test using completely arbitrary strings as input to the parser--all our other fuzz tests generate a document with apollo-smith, which is useful in its own way, but doesn't do a great job of finding issues in error edge cases like this. The new fuzz tests has a token limit as well. I found a few more cases that could panic with this.

my own non-recursive root operation type parsing 🤪
stray top-level string value at the token limit
token limit reached in the middle of parsing a type reference

This also fixes a case where bogus input was accepted, and did not raise a parse error: ```graphql schema { query: Query { mutation: Mutation { subscription: Subscription } ```

…irective locations parser

crates/apollo-parser/src/parser/grammar/document.rs

crates/apollo-parser/src/parser/grammar/operation.rs

goto-bus-stop · 2024-02-09T11:05:01Z

crates/apollo-parser/src/parser/grammar/selection.rs

-                    p.err("expected at least one Selection in Selection Set");
+                // If there is no token,
+                None => {
+                    p.err_and_pop("expected an Inline Fragment or a Fragment Spread");


changing this from p.err() to p.err_and_pop() fixes the infinite loop.

Could a similar infinite loop could easily happen if another call to p.err() is added elsewhere in the future that should be p.err_and_pop() instead? What would be a good place to warn against this gotcha? The doc-comment for p.err() may be easy to miss.

I don't have a good answer for this, I will merge and file an issue

crates/apollo-parser/test_data/parser/err/0054_root_operation_type_with_extra_brackets.graphql

crates/apollo-parser/src/parser/mod.rs

fuzz/fuzz_targets/parser_limited.rs

crates/apollo-parser/src/parser/grammar/argument.rs

crates/apollo-parser/src/parser/grammar/document.rs

SimonSapin · 2024-02-09T12:59:17Z

crates/apollo-parser/src/parser/grammar/selection.rs

-                    p.err("expected at least one Selection in Selection Set");
+                // If there is no token,
+                None => {
+                    p.err_and_pop("expected an Inline Fragment or a Fragment Spread");


Could a similar infinite loop could easily happen if another call to p.err() is added elsewhere in the future that should be p.err_and_pop() instead? What would be a good place to warn against this gotcha? The doc-comment for p.err() may be easy to miss.

crates/apollo-parser/src/parser/grammar/ty.rs

goto-bus-stop added 9 commits February 9, 2024 10:27

fix(parser): ensure all loops advance parsing

6a8e7d3

fix(parser): write root operation parsing with a loop

994319d

This also fixes a case where bogus input was accepted, and did not raise a parse error: ```graphql schema { query: Query { mutation: Mutation { subscription: Subscription } ```

chore(parser): add peek_while_kind variant for simple loops

fb20467

fix(parser): add parse_separated_list helper; remove recursion from d…

8c6c9e9

…irective locations parser

add test that fails on main and is fixed here

802119e

Add fuzz target parsing arbitrary strings with token limit

493e74a

fix(parser): always consume token in operation_type() parser

ec16327

add failing test: stray StringValue at token limit

ac63c69

fix(parser): remove unwrap that may trigger with token limits

3258d9c

goto-bus-stop requested review from lrlna and SimonSapin as code owners February 9, 2024 11:03

goto-bus-stop commented Feb 9, 2024

View reviewed changes

crates/apollo-parser/src/parser/grammar/document.rs Show resolved Hide resolved

goto-bus-stop commented Feb 9, 2024

View reviewed changes

crates/apollo-parser/src/parser/grammar/operation.rs Show resolved Hide resolved

goto-bus-stop commented Feb 9, 2024

View reviewed changes

crates/apollo-parser/test_data/parser/err/0054_root_operation_type_with_extra_brackets.graphql Show resolved Hide resolved

goto-bus-stop added 3 commits February 9, 2024 12:19

fix(parser): fix panic if token limit is reached mid-type

678a27d

fix(parser): remove unwrap from enum_value parser

6905a8a

Merge branch 'main' into renee/no-parse-loop

a9ea3a4

SimonSapin mentioned this pull request Feb 9, 2024

parser::ty::named_type() can silently do nothing #829

Open

SimonSapin approved these changes Feb 9, 2024

View reviewed changes

goto-bus-stop added 3 commits February 9, 2024 15:53

Use peek_while_kind

880b1f4

Tweak fuzz comment

ba352d7

Merge branch 'main' into renee/no-parse-loop

7a6d7c7

goto-bus-stop mentioned this pull request Feb 13, 2024

Avoid misuse in error handling in apollo-parser #833

Open

goto-bus-stop merged commit d545a6a into main Feb 13, 2024
12 checks passed

goto-bus-stop deleted the renee/no-parse-loop branch February 13, 2024 15:37

goto-bus-stop mentioned this pull request Feb 14, 2024

[email protected] #836

Merged

goto-bus-stop mentioned this pull request Mar 26, 2024

testing apollo-rs on completely gibberish graphql #304

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parser): ensure all loops advance parsing, fuzz with arbitrary bytes #828

fix(parser): ensure all loops advance parsing, fuzz with arbitrary bytes #828

goto-bus-stop commented Feb 9, 2024 •

edited

Loading

goto-bus-stop Feb 9, 2024

SimonSapin Feb 9, 2024

goto-bus-stop Feb 13, 2024

SimonSapin Feb 9, 2024

fix(parser): ensure all loops advance parsing, fuzz with arbitrary bytes #828

fix(parser): ensure all loops advance parsing, fuzz with arbitrary bytes #828

Conversation

goto-bus-stop commented Feb 9, 2024 • edited Loading

goto-bus-stop Feb 9, 2024

Choose a reason for hiding this comment

SimonSapin Feb 9, 2024

Choose a reason for hiding this comment

goto-bus-stop Feb 13, 2024

Choose a reason for hiding this comment

SimonSapin Feb 9, 2024

Choose a reason for hiding this comment

goto-bus-stop commented Feb 9, 2024 •

edited

Loading