Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit encoding #100

Closed
wants to merge 2 commits into from
Closed

Explicit encoding #100

wants to merge 2 commits into from

Conversation

dmerejkowsky
Copy link
Collaborator

Fix #89

@dmerejkowsky
Copy link
Collaborator Author

I've also added a semgrep rule so that we always specify specific encodings in the future

@cgestes
Copy link

cgestes commented May 22, 2021

The patch seems to override the encoding defined in the system to force utf-8.

Maybe I dont fully understand the situation or the pyrhon side of things but it looks like the bug this patch is supposed to fix is as follow: the system is configured with an encoding different than utf-8, the file being read is utf-8 (or not faulty using an utf8 encoding).

Are all the files processed by tbump utf-8? Or python doing magic when passed encoding= utf-8?
If not I would rather allow overriding the default system encoding, but make it optional.

@cgestes
Copy link

cgestes commented May 22, 2021

Also a path to explore could be to work on raw content.
Is encoding needed?

@dmerejkowsky
Copy link
Collaborator Author

Meh. This was mostly an excuse to try out semgrep. And it works :)

I'm still not sure about the root cause of #89, so let's close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnicodeDecodeError: 'gbk' codec can't decode:
2 participants