Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration of pct encoding #65

Open
trevorsummerssmith opened this issue Mar 29, 2015 · 13 comments
Open

Allow configuration of pct encoding #65

trevorsummerssmith opened this issue Mar 29, 2015 · 13 comments
Milestone

Comments

@trevorsummerssmith
Copy link

I am writing a simple client to interact with AWS S3.

Their docs state: This requires a uri encoding by:
URI encode every byte. Uri-Encode() must enforce the following rules:

  • URI encode every byte except the unreserved characters: 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'.
  • The space character is a reserved character and must be encoded as "%20" (and not as "+").
  • Each Uri-encoded byte is formed by a '%' and the two-digit hexadecimal value of the byte.
  • Letters in the hexadecimal value must be uppercase, for example "%1A".
  • Encode the forward slash character, '/', everywhere except in the object key name. For example, if the object key name is photos/Jan/sample.jpg, the forward slash in the key name is not encoded.

As discussed in other issues on this project, the standards, and the reality with this sort of thing are very different. I think the answer is to give the end user a few functions to exude more control. This will allow for standards compliant apis, and arbitrarily not-standards-compliant apis to be used.

Thoughts? Thanks!
Trevor

@rgrinberg
Copy link
Member

+1

IMO, AWS is important enough for Uri to help out users trying to interface with it even if it will cost some complexity.

@dsheets
Copy link
Member

dsheets commented Apr 1, 2015

It looks like we would need some way to control the literalness of encoding particularly for the unencoded set and '/'. The '/' issue is well-known but this is the first time I've seen a requirement about the unencoded set. Support for this should be included.

@rgrinberg
Copy link
Member

@dsheets Would it be possible to tackle this before 2.0 as it is a pretty annoying blocker that you can't get around (if you're using cohttp as your client).

I'm willing to give this a try as well if you give me some directions. From the brief look I've had, it would be possible to just add a query_scheme parameter in Uri.t and then it's just a matter of implementing an AWS compatible Scheme module that handles safe chars for ``Query | Query_key | Query_value`. Am I on the right page?

@agarwal
Copy link

agarwal commented Sep 24, 2015

I'm also hitting problems with this, also for working with S3. FWIW, I'd be quite happy with more types that represent different encodings. A string isn't nearly specific enough.

The specific problem I'm currently having is to encode slashes in query values. The issue seems to be that \ is not encoded:

# Uri.make ~query:["foo", ["ba/r"]] () |> Uri.to_string;;
- : string = "?foo=ba/r"

Trying to work around this by encoding it myself doesn't work because % is encoded:

# Uri.make ~query:["foo", ["ba%2Fr"]] () |> Uri.to_string;;
- : string = "?foo=ba%252Fr"

The output I want is "?foo=ba%2Fr".

@hcarty
Copy link
Contributor

hcarty commented Sep 30, 2015

This would be useful when working with a library like zeromq as well. If you want to listen to connections from any outside host the host should be * which ends up escaped:

# Uri.make ~scheme:"tcp" ~host:"*" ~port:5555 ()
- : Uri.t = tcp://%2A:5555

@sazarkin
Copy link

sazarkin commented Aug 5, 2019

Sorry for resurrecting the dead issue.
But this library does not seems to be compatible with RFC
Compare with python query encoding:

>>> urllib.parse.urlencode({'foo':'web/games-applications?title=&sort_bef_combine=+&sort_order=&sort_by=&page=36'})
'foo=web%2Fgames-applications%3Ftitle%3D%26sort_bef_combine%3D%2B%26sort_order%3D%26sort_by%3D%26page%3D36'

and Uri

utop # Uri.make ~query:["foo", [Uri.pct_decode "web/games-applications?title=&sort_bef_combine=+&sort_order=&sort_by=&page=36"]] () |> Uri.to_string;;
- : string =
"?foo=web/games-applications?title=%26sort_bef_combine=%2B%26sort_order=%26sort_by=%26page=36"

@Lupus
Copy link

Lupus commented Aug 6, 2019

@avsm any thoughts on this? We are forced to vendor ocaml-uri and adjust encoding for our business needs, which is something we try to avoid if at all possible. I'm wondering how many other vendored versions exist in the wild and how much effort is wasted on constant rebasing with upstream. Even a simple (and may be ugly) way to customize encoding would be very welcome by the users.

@tmcgilchrist
Copy link
Member

@avsm is someone actively working on a fix for this? I'm using this in ocaml-aws.

@Lupus Are you able to share your vendored copy of ocaml-uri publicly? I'd be interested in what changes you needed to get it working?

@avsm
Copy link
Member

avsm commented Apr 2, 2020

No one is working on this issue at the moment; I’m happy to review a PR

@tmcgilchrist
Copy link
Member

Understood, I'll start reading the code ;-)

@avsm
Copy link
Member

avsm commented Apr 2, 2020

Much appreciated. It might be easier done over #142 which I’d like to merge soon

@Lupus
Copy link

Lupus commented Apr 2, 2020

@tmcgilchrist we ended up working around this in other services so that URI encodings are compatible (we had hand-rolled encodings there anyways), that was easier than maintaining an internal fork.

@tmcgilchrist
Copy link
Member

tmcgilchrist commented Sep 2, 2020

#147 looks to cover angstrom support from #142 and PCT encoding. I'm in the process of testing that for ocaml-aws.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants