-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate ext-parameter when utf8 and latin1 representations don't match #27
Comments
Ah, I see. So the default module of this module is to follow the specification, as in just because Chrome does not follow the actual specification does not make the bug in this module, as of course there are other clients who do follow the spec. What if instead of just generating a longer string when not actually required this module adds an option to allow you to always generate both, for example, so you can decide what to do (maybe sniff the user-agent header and only do it then, or unconditionally if that is what you want)? |
Another potentional option is that this module could kick in the use of extension parameter if the filename falls outside of US-ASCII instead of ISO-8859-1, though in order to know if that would actually be effective, we would need a much better understanding of what exactly these buggy clients are doing with the bytes in the file name, as not all single byte character schemes match the US-ASCII set for the first 127 characters. |
I'll create a Chromium issue as well for this. However, the specification advises to use ASCII rather than ISO-8859-1:
This strategy seems to be followed by Django, Flask, and Rails. |
Right, probably the reason that is added to the appendix is because there are buggy clients out there who are not following the specification, as they are trying to account for buggy servers who are accidentally generated bad header values in the wrong character set. It is definitely a tricky problem, which is why I generally try to follow the actual standards, since they actually spell out the way to do it. That is why I also provided the second suggestion above, though. The spec itself clearly spells out that the header is ISO-8859-1 character encoding, regardless of buggy client implementations. So this implementation is not buggy here. I suggested that maybe we can add options so you can make the decisions you like to better control the behavior that would work better for your case (my first comment). Would that work for you? |
I appreciate your quick response, and I understand wanting to avoid sending extra data for misbehaving clients, but I think this module would be better if by default it followed the advice of the spec and the behavior of other popular web server frameworks. I understand if you disagree — an option would also be helpful. Thanks! |
Right. Of course the RFC has specific wording for the spec, like MUST NOT and SHOULD NOT, etc. The appendix is not even a main point of the spec, relegated to a footnote, without even using the standard wording, so it's hard to draw concrete instructions from it. So to be clear: I'm not saying I am not going to change anything, and your discussion on this is valuable, so please do not feel dismayed. |
I might be missing something, but I don't actually see the spec use MUST, SHOULD, MAY, etc. in regards to using ISO-8859-1. It seems to only mention ISO-8859-1 when talking about the "filename*" parameter, and how it allows characters outside the ISO-8859-1 character set. This does imply that the "filename" parameter can be ISO-8859-1, but it doesn't actually come out and say that anywhere else, so it seems a little ambiguous to me. However, I do think the appendix is clear about the recommendation to avoid non-ASCII characters. Here is the Chromium bug I filed: https://bugs.chromium.org/p/chromium/issues/detail?id=1006345. Thanks again for considering a change here! |
This is because since this spec is in relation to a HTTP header, the spec for HTTP itself is what declares header values as ISO-8859-1, so it only passively mentions it since it assumes that as prior knowledge to the specific Content-Disposition header specification. You can see they try to provide the needed link a few times in the spec, notably in https://tools.ietf.org/html/rfc6266#appendix-C
Yes, I do not disagree here. I was just noting it was relegated to an appendix, when it could have been made an actual rule by either not defining the ABNF of the the header as
in which case the
with TEXT being
or annotating in the main part of the spec that TEXT "MUST NOT" contain characters above 0x7f (which, also, they could have just not used the |
Looking around it seems like Firefox has a similar bug open: https://bugzilla.mozilla.org/show_bug.cgi?id=588409 |
From the new spec:
I guess |
Yea, the CD header existed well before the new specs were written, back in the historical days. The main reason why this module is emitting them was due to users who needed this (back when it was part of expressjs before extracted into it's own module). Many languages are covered by latin1 and there are clients where when the extended filename parameter is included they fail to parse the header. So back then the compromise was to make it so the full usage of latin1 would result in just the plain filename parameter, allowing things like spanish, french, and more all to generate the header without the extended parameter being there and working for various clients. There are many more clients out there than web browsers, for example both cURL and wget command line tools parse this header to determine the filename to save to disk. This is a hard balancing act and so that I don't have to read every client implementation out there, the compromise was fairly straight forward: just follow the spec and since it allows the usage of iso-8859-1 (latin1) in the basic filename parameter it made everyone happy. Apparently for many mant years until now...? Haha |
And, ultimately, nothing in the specs seems to be pointing to Chrome is doing the correct thing. At best, the specs admit that there are buggy clients out there (like Chrome, I guess). |
Fair enough, thank you for that explanation. |
A Chromium developer has responded to the bug:
|
This library will only generate the extended filename parameter if the filename is not representable in ISO-8859-1 (checking against NON_LATIN1_REGEXP).
However, some clients will not try to interpret the filename as ISO-8859-1. Chromium's Content-Disposition parsing code tries to interpret it as UTF-8, a referrer_charset, and the native OS default charset in turn. This ends up sometimes working for normal downloads in Chromium since it passes a referrer_charset that defaults to
windows-1252
. But this doesn't work for some other locales or cases where this referrer_charset isn't passed (like downloads in Electron).It seems like it would be useful to generate the extended parameter when the utf8 and latin1 representations don't match.
The text was updated successfully, but these errors were encountered: