Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-104400: Add more tests to pygettext #108173

Merged
merged 21 commits into from
Nov 3, 2024

Conversation

tomasr8
Copy link
Member

@tomasr8 tomasr8 commented Aug 20, 2023

As suggested here, I'm splitting the PR which updates pygettext into multiple smaller PRs. This first one just adds more comprehensive tests to pygettext without changing any functionality.

The test cases I added test message & docstring extraction which were not covered before. I also added a specific test for file locations.

Instead of having a test case for each possibility, I group the test cases into files on which I run the script and compare the entire script output. This has several advantages:

  • Because we check the whole script output, we can be sure that things like file location, flags, line wrapping, etc.. are also correct. The current tests only check for a presence of a given msgid which is insufficient.
  • The tests are (imo) more readable when in a single python file
  • Less code needed to add/remove tests

While writing the tests, I actually found more bugs than in the original PR. I documented these in the tests - I don't think it's worth fixing these now as most of these will be fixed automatically once pygettext is converted to use the AST.

@AA-Turner
Copy link
Member

@tomasr8 Thanks for opening the new PR! Could you look at the failing tests please?

======================================================================
FAIL: test_pygettext_output (test.test_tools.test_i18n.test_i18n.Test_pygettext.test_pygettext_output) [Input file: data/messages.py]
Test that the pygettext output exactly matches a file.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 350, in test_pygettext_output
    self.assert_POT_equal(expected, output)
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 75, in assert_POT_equal
    self.assertEqual(expected, actual)
AssertionError: '# SO[397 chars]rset=UTF-8\\n"\n"Content-Transfer-Encoding: 8b[682 chars]\n\n' != '# SO[397 chars]rset=cp1252\\n"\n"Content-Transfer-Encoding: 8[683 chars]\n\n'
Diff is 1325 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_pygettext_output (test.test_tools.test_i18n.test_i18n.Test_pygettext.test_pygettext_output) [Input file: data/docstrings.py]
Test that the pygettext output exactly matches a file.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 350, in test_pygettext_output
    self.assert_POT_equal(expected, output)
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 75, in assert_POT_equal
    self.assertEqual(expected, actual)
AssertionError: '# SO[397 chars]rset=UTF-8\\n"\n"Content-Transfer-Encoding: 8b[340 chars]\n\n' != '# SO[397 chars]rset=cp1252\\n"\n"Content-Transfer-Encoding: 8[341 chars]\n\n'
Diff is 956 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_pygettext_output (test.test_tools.test_i18n.test_i18n.Test_pygettext.test_pygettext_output) [Input file: data/fileloc.py]
Test that the pygettext output exactly matches a file.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 350, in test_pygettext_output
    self.assert_POT_equal(expected, output)
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 75, in assert_POT_equal
    self.assertEqual(expected, actual)
AssertionError: '# SO[397 chars]rset=UTF-8\\n"\n"Content-Transfer-Encoding: 8b[294 chars]\n\n' != '# SO[397 chars]rset=cp1252\\n"\n"Content-Transfer-Encoding: 8[295 chars]\n\n'
Diff is 907 characters long. Set self.maxDiff to None to see it.

A

@tomasr8
Copy link
Member Author

tomasr8 commented Aug 20, 2023

hmm looks like an encoding issue. Hopefully, this'll fix it.

@AA-Turner
Copy link
Member

Looks like the same three tests failed. Do you have access to a Windows computer? If not I should be able to have a look later on.

A

@tomasr8
Copy link
Member Author

tomasr8 commented Aug 20, 2023

No worries! luckily I have a windows machine lying around 😅 The problem is that there's no way to specify the output encoding so it uses the platform default. This makes it difficult to compare the files because the charset is also part of the header.. I'll just normalize it the same way as I do with the creation date.

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some comments on the test code:

Lib/test/test_tools/test_i18n/test_i18n.py Outdated Show resolved Hide resolved
Lib/test/test_tools/test_i18n/test_i18n.py Outdated Show resolved Hide resolved
Lib/test/test_tools/test_i18n/test_i18n.py Outdated Show resolved Hide resolved
Lib/test/test_tools/test_i18n/test_i18n.py Outdated Show resolved Hide resolved
Lib/test/test_tools/test_i18n/test_i18n.py Outdated Show resolved Hide resolved
Lib/test/test_tools/test_i18n/test_i18n.py Outdated Show resolved Hide resolved
Lib/test/test_tools/test_i18n/test_i18n.py Outdated Show resolved Hide resolved
@AA-Turner AA-Turner added the tests Tests in the Lib/test dir label Aug 28, 2023
Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tomasr8
Copy link
Member Author

tomasr8 commented Aug 29, 2023

Thanks!

Thanks for the review!

@erlend-aasland
Copy link
Contributor

cc. @warsaw who asked for a ping on Discourse :)

@@ -1,6 +1,8 @@
"""Tests to cover the Tools/i18n package"""

import os
from pathlib import Path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to use pathlib? Other tests simply use os.path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were only a handful of uses of os so I went ahead and replaced them with pathlib which I think improves readability, but I'm happy to revert the change if you prefer to keep os :)

Lib/test/test_tools/test_i18n/test_i18n.py Outdated Show resolved Hide resolved
Lib/test/test_tools/test_i18n/__init__.py Outdated Show resolved Hide resolved
@tomasr8
Copy link
Member Author

tomasr8 commented Oct 28, 2024

I also added a --snapshot-update CLI argument to make it easy to regenerate the snapshots (as is already the case with some ast and recently argparse tests)

def update_POT_snapshots():
for input_file in DATA_DIR.glob('*.py'):
output_file = input_file.with_suffix('.pot')
contents = input_file.read_text(encoding='utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some files with non-UTF-8 encoding.

Since contents is only used to copy a file, you can read/write the binary content.

with temp_cwd(None):
Path(input_file.name).write_text(contents)
assert_python_ok(Test_pygettext.script, '--docstrings', input_file.name)
output = Path('messages.pot').read_text()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you read text, always specify the encoding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes problems on Windows, where the encoding is cp1252 so reading it back as utf8 fails. I don't know how else to get around this besides forcing pygettext to always output utf8 (or adding a configurable parameter). Do you have any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use -Xutf8 or PYTHONIOENCODING=utf-8 to run pygettext, because the text can be non-encodable with the locale encoding.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) November 3, 2024 13:48
@serhiy-storchaka serhiy-storchaka added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Nov 3, 2024
@serhiy-storchaka serhiy-storchaka merged commit dcae5cd into python:main Nov 3, 2024
42 checks passed
@miss-islington-app
Copy link

Thanks @tomasr8 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 3, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 3, 2024
@bedevere-app
Copy link

bedevere-app bot commented Nov 3, 2024

GH-126361 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 3, 2024
@bedevere-app
Copy link

bedevere-app bot commented Nov 3, 2024

GH-126362 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Nov 3, 2024
@tomasr8 tomasr8 deleted the pygettext-tests branch November 3, 2024 14:01
serhiy-storchaka pushed a commit that referenced this pull request Nov 3, 2024
serhiy-storchaka pushed a commit that referenced this pull request Nov 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants