Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BREAKING: Overhaul HTTP headers (make case-insensitive, allow for multiples) #34

Merged
merged 6 commits into from
Oct 3, 2024

Conversation

Frixuu
Copy link
Contributor

@Frixuu Frixuu commented Sep 26, 2024

According to spec, HTTP header field names should be treated in a case-insensitive way.

However, so far this was not the case with Weblink. This meant a bunch of things were silently very broken. For example, a Node.js fetch() client could not send POST request bodies: the server was expecting a Content-Length header, but got content-length instead.

This implementation introduces a new abstract HeaderName (and a companion HeaderValue). When implicitly assigning a String to a HeaderName, it simply gets converted to lowercase (plus some ASCII validation). This way it can be used as a key even in default standard library collections and the comparisons would be case-insensitive.

Of course, this means the original string casing is lost – but given that HTTP/2 (not supported currently) forces sent header names to be lowercase, I deemed it an acceptable loss.


Also according to spec, multiple header values with the same name can coexist. They can be either set as separate key-value pairs (i.e. Set-Cookie) or as a comma-separated list (e.g. Accept-Language, If-Match).

Request did not support that usecase. Response did, but made it difficult to query. Both classes now use a specialized collection, a HeaderMap, that allows for multiple values to be associated with one key.

By converting used header names to lowercase, they can get compared in a case-insensitive way.

Coincidentally, they also MUST be lowercased when connecting with HTTP/2 (not implemented yet, but maybe in the future?)
The amusing thing about `HeaderName` being an `enum` is that its values get suggested in one's IDE autocomplete when necessary, so it's harder for users to make typos.

However, thanks to `:from` we can still implicitly convert any arbitrary string and use it as a header key.
This unifies handling headers between `Request` and `Response`.
Old server code that may have expected a single header value may be surprised to see multiple of them now, thanks to splitting known headers when parsing a `Request`.

Old client code that may have expected repeated header names may be surprised to see a single value, comma- or semicolon-delimited, thanks to merging known headers when sending a `Response`.

NOTE: As of this commit, Weblink does not parse quoted " " strings nor CRLF folds in header values correctly. It never used to, but it still does not, too.
@Frixuu Frixuu changed the title BREAKING: Make header comparisons case-insensitive BREAKING: Overhaul HTTP headers (make case-insensitive, allow for multiples) Sep 27, 2024
@Frixuu
Copy link
Contributor Author

Frixuu commented Sep 27, 2024

This PR has ballooned more than I expected, so it's time for me to give you a quick status update.

What should work (as I understand it):

  • Incoming and outgoing headers' names are compared in a case-insensitive way.
  • Incoming and outgoing headers' names are checked for non-ASCII bytes.
  • Incoming and outgoing headers' values are checked for control characters.
  • Incoming headers which are known to have list-type values separate them.
  • Outgoing headers which are known to have list-type values merge them together.
  • Outgoing headers which are known to be unique are checked for the existence of multiple values.

What's not done yet, but I'm not sure if I want to cram it into the same patch:

  • Incoming header values with "quoted text" do not get recognized and unquoted
  • Incoming header values with line continuations ("LWS" in RFC 2616) do not get recognized as a single header.
  • Incoming data is not treated as ISO-8859-1, but always as UTF8 (both headers and payload, no matter the MIME type)

@PXshadow
Copy link
Owner

This PR has ballooned more than I expected, so it's time for me to give you a quick status update.

What should work (as I understand it):

  • Incoming and outgoing headers' names are compared in a case-insensitive way.
  • Incoming and outgoing headers' names are checked for non-ASCII bytes.
  • Incoming and outgoing headers' values are checked for control characters.
  • Incoming headers which are known to have list-type values separate them.
  • Outgoing headers which are known to have list-type values merge them together.
  • Outgoing headers which are known to be unique are checked for the existence of multiple values.

What's not done yet, but I'm not sure if I want to cram it into the same patch:

  • Incoming header values with "quoted text" do not get recognized and unquoted
  • Incoming header values with line continuations ("LWS" in RFC 2616) do not get recognized as a single header.
  • Incoming data is not treated as ISO-8859-1, but always as UTF8 (both headers and payload, no matter the MIME type)

I'd say split it up, you can group it, back I think we can merge this in, is there any other things you'd like to include in it?

@Frixuu
Copy link
Contributor Author

Frixuu commented Sep 29, 2024

Nah; if there was something I desperately needed to include, I'd set the PR as draft 😅 How would you like to proceed?

@PXshadow
Copy link
Owner

PXshadow commented Oct 2, 2024

I'll take a look today and most likely merge it.

return arr;
}

public static function populate():Array<Field> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the build macro approach well done.

@PXshadow PXshadow merged commit 90426fc into PXshadow:master Oct 3, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants