Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_token implementation is more permissive than standard #161

Open
declanvk opened this issue Apr 18, 2024 · 2 comments
Open

is_token implementation is more permissive than standard #161

declanvk opened this issue Apr 18, 2024 · 2 comments

Comments

@declanvk
Copy link

RFC 9110 Section 5.6.2 defines the grammar for field value tokens as

  token          = 1*tchar

  tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
                 / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
                 / DIGIT / ALPHA
                 ; any VCHAR, except delimiters

However in practice it seems that this specific grammar is relaxed to a larger set of allowed characters. I made a small test which checks this behavior:

#[test]
fn test_header_with_invalid_rfc91100_token_in_value() {
    const RESPONSE: &[u8] =
        b"HTTP/1.1 200 OK\r\nAccess-Control-Allow-Credentials: hell[o\r\nBread: baguette\r\n\r\n";

    let mut headers = [EMPTY_HEADER; 2];
    let mut response = Response::new(&mut headers[..]);

    let result = crate::ParserConfig::default().parse_response(&mut response, RESPONSE);
    let status = result.unwrap();

    assert_eq!(status, Status::Complete(78));
    assert_eq!(response.version.unwrap(), 1);
    assert_eq!(response.code.unwrap(), 200);
    assert_eq!(response.reason.unwrap(), "OK");
    assert_eq!(response.headers.len(), 2);
    assert_eq!(response.headers[0].name, "Access-Control-Allow-Credentials");
    assert_eq!(response.headers[0].value, &b"hell[o"[..]);
    assert_eq!(response.headers[1].name, "Bread");
    assert_eq!(response.headers[1].value, &b"baguette"[..]);
}

I also checked how Firefox handles this header with a small program

printf "HTTP/1.1 200 OK\r\nAccess-Control-Allow-Credentials: hell[o\r\nBread: baguette\r\n\r\n" | nc -l 4040

Then visited localhost:4040 in Firefox. The recorded execution also showed the delimiter in the token value:

Screenshot 2024-04-17 at 11 34 17 PM

Based on the test assertions and the firefox screenshot it seems consistent that both programs allow characters outside the grammar specified in RFC91100.

  • Was this intentional at the time of writing the tokenization code?
  • Would it possible to have a strict mode which would reject more field values for invalid tokens?
@seanmonstar
Copy link
Owner

It's true that the is_token method differs from the spec, which was motivated by real world traffic. However, the parts you're showing, header values, are not defined to be tokens even in the spec. Header values are defined as:

  field-value    = *field-content
  field-content  = field-vchar
                   [ 1*( SP / HTAB / field-vchar ) field-vchar ]
  field-vchar    = VCHAR / obs-text
  obs-text       = %x80-FF

Which allows [ and plenty more in header values.

@declanvk
Copy link
Author

declanvk commented Apr 29, 2024

Thanks for clarifying! When I re-read, I think then only field-names are defined as tokens: https://www.rfc-editor.org/rfc/rfc9110#section-5.1-2? I see it used in a couple other places like methods, parameter names, connection options, etc.

However, looking in

fn is_token(b: u8) -> bool {
, I only see the is_token used to parse tokens and something with parsing URIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants