Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More strongly-typed HTTP and HTTPS URIs #162

Open
torinnd opened this issue Oct 17, 2022 · 7 comments
Open

More strongly-typed HTTP and HTTPS URIs #162

torinnd opened this issue Oct 17, 2022 · 7 comments

Comments

@torinnd
Copy link
Contributor

torinnd commented Oct 17, 2022

I find myself reaching for a more strongly typed URI when the scheme is one of "http" or "https". As per https://httpwg.org/specs/rfc9110.html#rfc.section.4.2 the following semantics need to be enforced for absolute URIs with one of these two schemes:

  • The [scheme] field must be populated with exactly "http" or "https"
  • The [userinfo] field is deprecated and must be omitted
  • The [host] field is required

One way this could be addressed is by leveraging a GADT like:

type 'a t

val host : [< `Base | `Http ] t -> string option
val host' : [ `Http ] t -> string

val scheme : [< `Base | `Http ] t -> string option
val scheme' : [ `Http ] t -> [ `Http | `Https ]

This would be a non-trivial API change for URI, of course. A smaller change could be to introduce a submodule into the MLI for an [Absolute_http.h] with appropriate conversion functions:

val of_t : t -> (h, [ `Msg of string]) result
val to_t : h -> t

... and a subset of the [Uri] MLI adapted to the constraints enforced by the scheme (e.g. [make] would require [host], omit [userinfo], and accept a more constrained set of [scheme]'s).

I have a fairly narrow understanding of other use cases for URI (for example, I see there is an old Issue #158 for Websocket support, which feels related), but imagine there are additional considerations besides what I've raised here. The big win for me personally would be a calling pattern where I could be confident that a [host] is present.

@avsm
Copy link
Member

avsm commented Oct 17, 2022

Introducing a submodule sounds fine, and we can move it out to a toplevel Absolute_uri in the future. I'm not sure that Absolute is the right prefix though; perhaps Uri_with_host or Uri.With_host is more descriptive but more unwieldy. It would be good to check through the rest of RFC9110 and make sure there's nothing else aside from the host that needs attention, while we're doing this.

@torinnd
Copy link
Contributor Author

torinnd commented Oct 17, 2022

The deprecation of userinfo also seems important for RFC-compliance (https://httpwg.org/specs/rfc9110.html#rfc.section.4.2.4). A rough version of the submodule approach could be:

# mli
(** Specializations for HTTP and HTTPS schemes as per RFC9110 *)
module Absolute_http : sig
  type h

  val of_t : t -> (h, [ `Msg of string ]) result
  val to_t : h -> t

  val of_string : string -> (h, [ `Msg of string ]) result
  val to_string : ?pct_encoder:pct_encoder -> h -> string

  val make : scheme:[ `Http | `Https ]-> host:string ->
    ?port:int -> ?path:string -> ?query:(string * string list) list ->
    ?fragment:string -> unit -> h
end

# ml
module Absolute_http = struct
  type h =
    { scheme : [ `Http | `Https ]
    ; host : Pct.decoded
    ; port : int option
    ; path : Path.t
    ; query : Query.t
    ; fragment : Pct.decoded option
    }

  let ( let* ) = Result.bind

  let to_t { scheme; host; port; path; query; fragment } =
    let scheme =
      match scheme with
      | `Http -> Pct.cast_decoded "http"
      | `Https -> Pct.cast_decoded "https"
    in
    { scheme = Some scheme
    ; userinfo = None
    ; host = Some host
    ; port
    ; path
    ; query
    ; fragment
    }
  ;;

  let of_t { scheme; userinfo; host; port; path; query; fragment } =
    let* scheme =
      match scheme with
      | None -> Error (`Msg "No scheme present in URI")
      | Some scheme ->
        (match Pct.uncast_decoded scheme with
         | "http" -> Ok `Http
         | "https" -> Ok `Https
         | unsupported_scheme ->
           Error
             (`Msg
                (Printf.sprintf
                   "Only http and https URIs are supported. %s is invalid."
                   unsupported_scheme)))
    in
    let* host = Option.to_result ~none:(`Msg "host is required for HTTP(S) uris") host in
    let* () =
      match userinfo with
      | None -> Ok ()
      | Some (_ : string * string option) ->
        Error (`Msg "userinfo is a deprecated field in HTTP(S) URIs as per RFC9110")
    in
    Ok { scheme; host; port; path; query; fragment }
  ;;

  let of_string s = of_string s |> of_t
  let to_string ?pct_encoder h = to_t h |> to_string ?pct_encoder

  let normalize h =
    {h with
        host=Pct.cast_decoded (String.lowercase_ascii (Pct.uncast_decoded h.host))
       }

  let make ~scheme ~host ?port ?path ?query ?fragment () =
    let decode = function
      |Some x -> Some (Pct.cast_decoded x) |None -> None in
    let path = match path with
      |None -> [] | Some p ->
        let path = path_of_encoded p in
        match path with
        |  "/"::_ |  [] -> path
        | _  -> "/"::path
    in
    let query = match query with
      | None -> Query.KV []
      | Some p -> Query.KV p
    in
    normalize
      { scheme;
        host=Pct.cast_decoded host; port; path; query; fragment=decode fragment }
end

@avsm
Copy link
Member

avsm commented Oct 17, 2022

I see that Absolute_uri is a term from RFC9110, so we could just call this Uri.Absolute to indicate the mandatory presence of a non-empty path.

@avsm
Copy link
Member

avsm commented Oct 17, 2022

I think it might be worth separating the errors out for a deprecated user_info (which clients could validly choose to allow to pass through, after stripping it out) from the missing host error.

@torinnd
Copy link
Contributor Author

torinnd commented Oct 17, 2022

I see that Absolute_uri is a term from RFC9110, so we could just call this Uri.Absolute to indicate the mandatory presence of a non-empty path.

Because the specificity seems important, I want to call out that an Absolute URI implies the presence of a hier-part, not necessarily an authority section (which is where a host identifier can exist):

   hier-part     = "//" authority path-abempty
                 / path-absolute
                 / path-rootless
                 / path-empty

I don't think this would satisfy the requirements in RFC9110, so it's probably not specific enough to describe this as a [Uri.Absolute.t]:

A sender MUST NOT generate an "http" URI with an empty host identifier. A recipient that processes such a URI reference MUST reject it as invalid.

I'll also offer mild pushback on the userinfo section, based on the language of the RFC:

A sender MUST NOT generate the userinfo subcomponent (and its "@" delimiter) when an "http" or "https" URI reference is generated within a message as a target URI or field value.

Before making use of an "http" or "https" URI reference received from an untrusted source, a recipient SHOULD parse for userinfo and treat its presence as an error

I suppose there's an argument to be made that the client-side SHOULD isn't strong enough to forbid this, but it seems to me that erroring when attempting to parse a Uri.t as a RFC9110-complaint HTTP(S) URI is aligned with the spirit of the RFC, if not the exact language.

@avsm
Copy link
Member

avsm commented Oct 17, 2022

An error doesn't mean that the connection should be aborted; it means that the URI shouldn't be propagated. From that RFC text, it seems valid for me for an HTTP proxy to spot a user-info field, remove it (perhaps authenticating out of band), and to make a downstream request without it. We should make sure our URI interfaces support that. It might be that it happens within Uri and not in Absolute_uri though.

@torinnd
Copy link
Contributor Author

torinnd commented Nov 25, 2022

I've cleaned up the proposed changes above in #164

avsm added a commit to avsm/opam-repository that referenced this issue Apr 19, 2023
CHANGES:

* Add `Uri.Absolute_http`, an RFC9110-compliance specialization of a
  `Uri.t`. (mirage/ocaml-uri#164 mirage/ocaml-uri#162 @torinnd).
* Add a `uri-bench` package for the benchmarking dependencies in this
  repository (mirage/ocaml-uri#166 @tmcgilchrist).
kit-ty-kate pushed a commit to avsm/opam-repository that referenced this issue Apr 26, 2023
CHANGES:

* Add `Uri.Absolute_http`, an RFC9110-compliance specialization of a
  `Uri.t`. (mirage/ocaml-uri#164 mirage/ocaml-uri#162 @torinnd).
* Add a `uri-bench` package for the benchmarking dependencies in this
  repository (mirage/ocaml-uri#166 @tmcgilchrist).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants