-
-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check for text file encodings #685
base: main
Are you sure you want to change the base?
Conversation
Don't know if I'm too happy with the platform dependent default, perhaps defaulting to utf-8 would be better. No strong opinions between those. |
Also, incremental decoding instead of Can polish once we establish this will be accepted in the first place. |
From #683 (comment)
Sadly, not all developers run test suites or test their code before pushing code to repos. Not all repos even have test suites for that matter. It's useful to be able to catch errors locally in pre-commit before they hit CI, or even staging/production. Anyway, I think the opinion what qualifies as a "decent programming language" belongs here. No matter what one thinks of them, for example C and JavaScript and Perl code can be encoded let's say in ISO-8859-1 without any encoding markers. I don't think there even is a way to convey that at least for C code, not sure for JS; for Perl there is Chances are that the ISO-8859-1 in them is not the intention though and they now don't behave correctly, perhaps one wanted UTF-8 instead. Or perhaps one wanted code that is not that brittle wrt source encoding in the first place and for that reason wants to restrict everything to ASCII to avoid bugs of this class altogether. Or maybe it's not a runtime issue, one just wanted some comments/API docs in them in UTF-8 but a badly configured editor did it in ISO-8859-1 or vice versa, and now docs are not being generated right. This check would help. From #683 (comment)
What I'm trying to say is that just like pre-commit, this check can be useful for repos that don't have anything whatsoever to do with programming. Websites built with markdown, static HTML + CSS, other text content, including plain text, for whatever purpose. |
yeah agreed, but I don't think it's our place to supply those checks when a language-specific tool can provide better diagnostic SyntaxErrors when it's a problem for them my main concerns with this are:
|
Some counterarguments:
|
This would be useful in checking any kind of text files are encoded in the desired user-configurable encoding. Regardless of whether it treats or mitigates the CVE, this has value in tons of repositories. Also, language-specific tooling means a lot of different things to a lot of people. I use vim and write a lot of terraform code and yaml for kubernetes deployments. At least for Terraform, the cli will error out if, for example, you read a file with the What I need is exactly this pre-commit hook to prevent files that have any encoding other than UTF-8 from being committed in the first place. Respectfully, I request the addition of this hook. |
Refs #683