Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the issue of Chinese character garbled encoding in CSV export. #1288

Closed
wants to merge 3 commits into from

Conversation

qqAys
Copy link
Contributor

@qqAys qqAys commented Feb 4, 2024

Fixing the issue of garbled characters (Chinese characters) when opening the CSV export in Microsoft Excel.
Using version 6.1.3 and opening CSV export files with Chinese content by default in Microsoft Excel in a Chinese environment results in garbled characters due to encoding issues, as shown in the image (the top part is the current version export, and the bottom part is the export after fixing).
Snipaste_2024-02-04_14-31-15

@almet
Copy link
Member

almet commented Feb 6, 2024

Thanks. This seems to be failing the tests though… We might need to change the way the .csv files are loaded.

@qqAys
Copy link
Contributor Author

qqAys commented Feb 19, 2024

Happy Chinese New Year! Thank you for bringing this to my attention. I've made the necessary adjustments to address the test failures. Regarding the loading of the .csv files, I've implemented some changes to ensure proper handling. Please let me know if you encounter any further issues or if there's anything else I can assist with.

@zorun
Copy link
Collaborator

zorun commented Mar 19, 2024

utf-8-sig seems to add a BOM sequence at the start of the CSV file.

Is that standard, and can that break other clients? It feels like Excel is the problem here, it should correctly handle UTF-8 encoded files.

@qqAys
Copy link
Contributor Author

qqAys commented Mar 21, 2024

utf-8-sig seems to add a BOM sequence at the start of the CSV file.

Is that standard, and can that break other clients? It feels like Excel is the problem here, it should correctly handle UTF-8 encoded files.

Thanks for your comment! Indeed, I found that changing the encoding of the CSV file from UTF-8 to UTF-8-SIG can solve the problem of garbled Chinese characters. UTF-8-SIG adds a special identifier at the beginning of the file, helping software to correctly parse the file and avoid garbled characters.

Regarding the issue you mentioned about other clients possibly experiencing problems, it depends mainly on how those clients are implemented. Most modern software can handle UTF-8 files with a BOM correctly, but older versions or software with specific settings might struggle with the BOM. Therefore, using UTF-8-SIG requires consideration of compatibility with different clients.

As for whether Excel should be able to handle UTF-8 encoded files correctly, that is indeed an important question. Modern versions of Excel can handle UTF-8 encoded files well, but there may still be some issues. This might require further improvement and optimization from software vendors.

In conclusion, using UTF-8-SIG is an effective solution, but it's important to balance and adjust based on the actual situation to ensure the best compatibility and user experience. Thanks again for your comment and suggestion!

@almet
Copy link
Member

almet commented Jan 3, 2025

In practice, you can generate either RFC 4180 compliant CSV files, or you can generate Microsoft compatible CSV files.

Stack Overflow

I'd rather not add a BOM for all our CSV files. If we want to be able to generate Excel-Compatible CSV files, then probably we should offer the option to the user to chose between the two file formats.

To me, adding a BOM to all the generated .csv files looks like it could break other readers following RFC 4180 to break, which probably we don't want either.

@almet
Copy link
Member

almet commented Jan 5, 2025

Closing this issue for now as I believe we don't want to merge it as-is. As I offered, feel free to reopen with a different approach proposing multiple download formats.

Also: thanks for the delay on all this, the project has been unmaintained for the last year or so, and we're getting back at it :-)

@almet almet closed this Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants