gh-104400: Add more tests to pygettext #108173

tomasr8 · 2023-08-20T15:24:19Z

As suggested here, I'm splitting the PR which updates pygettext into multiple smaller PRs. This first one just adds more comprehensive tests to pygettext without changing any functionality.

The test cases I added test message & docstring extraction which were not covered before. I also added a specific test for file locations.

Instead of having a test case for each possibility, I group the test cases into files on which I run the script and compare the entire script output. This has several advantages:

Because we check the whole script output, we can be sure that things like file location, flags, line wrapping, etc.. are also correct. The current tests only check for a presence of a given msgid which is insufficient.
The tests are (imo) more readable when in a single python file
Less code needed to add/remove tests

While writing the tests, I actually found more bugs than in the original PR. I documented these in the tests - I don't think it's worth fixing these now as most of these will be fixed automatically once pygettext is converted to use the AST.

Issue: pygettext: use an AST parser instead of a tokenizer #104400

AA-Turner · 2023-08-20T17:15:41Z

@tomasr8 Thanks for opening the new PR! Could you look at the failing tests please?

======================================================================
FAIL: test_pygettext_output (test.test_tools.test_i18n.test_i18n.Test_pygettext.test_pygettext_output) [Input file: data/messages.py]
Test that the pygettext output exactly matches a file.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 350, in test_pygettext_output
    self.assert_POT_equal(expected, output)
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 75, in assert_POT_equal
    self.assertEqual(expected, actual)
AssertionError: '# SO[397 chars]rset=UTF-8\\n"\n"Content-Transfer-Encoding: 8b[682 chars]\n\n' != '# SO[397 chars]rset=cp1252\\n"\n"Content-Transfer-Encoding: 8[683 chars]\n\n'
Diff is 1325 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_pygettext_output (test.test_tools.test_i18n.test_i18n.Test_pygettext.test_pygettext_output) [Input file: data/docstrings.py]
Test that the pygettext output exactly matches a file.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 350, in test_pygettext_output
    self.assert_POT_equal(expected, output)
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 75, in assert_POT_equal
    self.assertEqual(expected, actual)
AssertionError: '# SO[397 chars]rset=UTF-8\\n"\n"Content-Transfer-Encoding: 8b[340 chars]\n\n' != '# SO[397 chars]rset=cp1252\\n"\n"Content-Transfer-Encoding: 8[341 chars]\n\n'
Diff is 956 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_pygettext_output (test.test_tools.test_i18n.test_i18n.Test_pygettext.test_pygettext_output) [Input file: data/fileloc.py]
Test that the pygettext output exactly matches a file.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 350, in test_pygettext_output
    self.assert_POT_equal(expected, output)
  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 75, in assert_POT_equal
    self.assertEqual(expected, actual)
AssertionError: '# SO[397 chars]rset=UTF-8\\n"\n"Content-Transfer-Encoding: 8b[294 chars]\n\n' != '# SO[397 chars]rset=cp1252\\n"\n"Content-Transfer-Encoding: 8[295 chars]\n\n'
Diff is 907 characters long. Set self.maxDiff to None to see it.

A

tomasr8 · 2023-08-20T17:30:31Z

hmm looks like an encoding issue. Hopefully, this'll fix it.

AA-Turner · 2023-08-20T18:13:02Z

Looks like the same three tests failed. Do you have access to a Windows computer? If not I should be able to have a look later on.

A

tomasr8 · 2023-08-20T18:30:17Z

No worries! luckily I have a windows machine lying around 😅 The problem is that there's no way to specify the output encoding so it uses the platform default. This makes it difficult to compare the files because the charset is also part of the header.. I'll just normalize it the same way as I do with the creation date.

AA-Turner

Thanks! Some comments on the test code:

Lib/test/test_tools/test_i18n/test_i18n.py

Co-authored-by: Adam Turner <[email protected]>

AA-Turner

Thanks!

tomasr8 · 2023-08-29T21:44:58Z

Thanks!

Thanks for the review!

erlend-aasland · 2023-08-30T07:43:07Z

cc. @warsaw who asked for a ping on Discourse :)

serhiy-storchaka · 2024-10-28T18:10:26Z

Lib/test/test_tools/test_i18n/test_i18n.py

@@ -1,6 +1,8 @@
 """Tests to cover the Tools/i18n package"""

 import os
+from pathlib import Path


Is it necessary to use pathlib? Other tests simply use os.path.

There were only a handful of uses of os so I went ahead and replaced them with pathlib which I think improves readability, but I'm happy to revert the change if you prefer to keep os :)

Lib/test/test_tools/test_i18n/test_i18n.py

Lib/test/test_tools/test_i18n/__init__.py

tomasr8 · 2024-10-28T20:36:17Z

I also added a --snapshot-update CLI argument to make it easy to regenerate the snapshots (as is already the case with some ast and recently argparse tests)

serhiy-storchaka · 2024-11-03T12:28:24Z

Lib/test/test_tools/test_i18n.py

+def update_POT_snapshots():
+    for input_file in DATA_DIR.glob('*.py'):
+        output_file = input_file.with_suffix('.pot')
+        contents = input_file.read_text(encoding='utf-8')


It would be nice to have some files with non-UTF-8 encoding.

Since contents is only used to copy a file, you can read/write the binary content.

serhiy-storchaka · 2024-11-03T12:28:51Z

Lib/test/test_tools/test_i18n.py

+        with temp_cwd(None):
+            Path(input_file.name).write_text(contents)
+            assert_python_ok(Test_pygettext.script, '--docstrings', input_file.name)
+            output = Path('messages.pot').read_text()


When you read text, always specify the encoding.

This causes problems on Windows, where the encoding is cp1252 so reading it back as utf8 fails. I don't know how else to get around this besides forcing pygettext to always output utf8 (or adding a configurable parameter). Do you have any suggestions?

We should use -Xutf8 or PYTHONIOENCODING=utf-8 to run pygettext, because the text can be non-encodable with the locale encoding.

serhiy-storchaka

LGTM. 👍

miss-islington-app · 2024-11-03T14:01:13Z

Thanks @tomasr8 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

(cherry picked from commit dcae5cd) Co-authored-by: Tomas R. <[email protected]>

bedevere-app · 2024-11-03T14:01:25Z

GH-126361 is a backport of this pull request to the 3.13 branch.

bedevere-app · 2024-11-03T14:01:30Z

GH-126362 is a backport of this pull request to the 3.12 branch.

(cherry picked from commit dcae5cd) Co-authored-by: Tomas R <[email protected]>

Add more tests for pygettext

06d86eb

bedevere-bot mentioned this pull request Aug 20, 2023

pygettext: use an AST parser instead of a tokenizer #104400

Open

bedevere-bot added the awaiting review label Aug 20, 2023

tomasr8 mentioned this pull request Aug 20, 2023

gh-104400: pygettext: use an AST parser instead of a tokenizer #104402

Draft

tomasr8 added 2 commits August 20, 2023 19:28

Specify file encoding

b1b0892

Normalize charset

eb7f488

AA-Turner added the skip news label Aug 20, 2023

AA-Turner reviewed Aug 21, 2023

View reviewed changes

tomasr8 and others added 2 commits August 21, 2023 20:20

Apply suggestions from code review

7428393

Co-authored-by: Adam Turner <[email protected]>

Apply suggestions from code review

f06cbb5

AA-Turner added the tests Tests in the Lib/test dir label Aug 28, 2023

AA-Turner approved these changes Aug 28, 2023

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Aug 28, 2023

erlend-aasland requested a review from warsaw August 30, 2023 07:42

tomasr8 mentioned this pull request Oct 9, 2023

gettext: remove unecessary test cases testing single/double quotes #107510

Open

Merge branch 'main' into pygettext-tests

f4b7955

serhiy-storchaka self-requested a review October 9, 2023 16:25

erlend-aasland and others added 5 commits December 4, 2023 11:53

Merge branch 'main' into pygettext-tests

c6cb8b9

Merge branch 'main' into pygettext-tests

6a76d97

Merge branch 'main' into pygettext-tests

ebcc6ea

Merge branch 'main' into pygettext-tests

9dbc1c7

Merge branch 'main' into pygettext-tests

1c3d46a

serhiy-storchaka reviewed Oct 28, 2024

View reviewed changes

tomasr8 added 7 commits October 28, 2024 21:22

Simplify test

a5501b8

Extract POT normalization into a function

e6b8c80

Add a CLI command to regenerate snapshots

5fba1bb

Regenerate snapshots

9f388af

Simplify code

88f6350

Set maxDiff to None

f4ed4e4

Add test dir to Makefile

63eef00

tomasr8 requested a review from erlend-aasland as a code owner October 28, 2024 20:31

tomasr8 requested a review from serhiy-storchaka November 3, 2024 12:14

serhiy-storchaka reviewed Nov 3, 2024

View reviewed changes

Use '-Xutf8'

c26d488

serhiy-storchaka approved these changes Nov 3, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Nov 3, 2024

serhiy-storchaka enabled auto-merge (squash) November 3, 2024 13:48

serhiy-storchaka added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Nov 3, 2024

serhiy-storchaka merged commit dcae5cd into python:main Nov 3, 2024
42 checks passed

bedevere-app bot removed the awaiting merge label Nov 3, 2024

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 3, 2024

pythongh-104400: Add more tests to pygettext (pythonGH-108173)

3422519

(cherry picked from commit dcae5cd) Co-authored-by: Tomas R. <[email protected]>

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 3, 2024

pythongh-104400: Add more tests to pygettext (pythonGH-108173)

26bb10d

(cherry picked from commit dcae5cd) Co-authored-by: Tomas R. <[email protected]>

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 3, 2024

bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Nov 3, 2024

tomasr8 deleted the pygettext-tests branch November 3, 2024 14:01

serhiy-storchaka pushed a commit that referenced this pull request Nov 3, 2024

[3.12] gh-104400: Add more tests to pygettext (GH-108173) (GH-126362)

b0e08f5

(cherry picked from commit dcae5cd) Co-authored-by: Tomas R <[email protected]>

serhiy-storchaka pushed a commit that referenced this pull request Nov 3, 2024

[3.13] gh-104400: Add more tests to pygettext (GH-108173) (GH-126361)

86d6c68

(cherry picked from commit dcae5cd) Co-authored-by: Tomas R <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-104400: Add more tests to pygettext #108173

gh-104400: Add more tests to pygettext #108173

tomasr8 commented Aug 20, 2023 •

edited by bedevere-bot

Loading

AA-Turner commented Aug 20, 2023

tomasr8 commented Aug 20, 2023

AA-Turner commented Aug 20, 2023

tomasr8 commented Aug 20, 2023 •

edited

Loading

AA-Turner left a comment

AA-Turner left a comment

tomasr8 commented Aug 29, 2023

erlend-aasland commented Aug 30, 2023

serhiy-storchaka Oct 28, 2024

tomasr8 Oct 28, 2024

tomasr8 commented Oct 28, 2024 •

edited

Loading

serhiy-storchaka Nov 3, 2024

serhiy-storchaka Nov 3, 2024

tomasr8 Nov 3, 2024

serhiy-storchaka Nov 3, 2024

serhiy-storchaka left a comment

miss-islington-app bot commented Nov 3, 2024

bedevere-app bot commented Nov 3, 2024

bedevere-app bot commented Nov 3, 2024

gh-104400: Add more tests to pygettext #108173

gh-104400: Add more tests to pygettext #108173

Conversation

tomasr8 commented Aug 20, 2023 • edited by bedevere-bot Loading

AA-Turner commented Aug 20, 2023

tomasr8 commented Aug 20, 2023

AA-Turner commented Aug 20, 2023

tomasr8 commented Aug 20, 2023 • edited Loading

AA-Turner left a comment

Choose a reason for hiding this comment

AA-Turner left a comment

Choose a reason for hiding this comment

tomasr8 commented Aug 29, 2023

erlend-aasland commented Aug 30, 2023

serhiy-storchaka Oct 28, 2024

Choose a reason for hiding this comment

tomasr8 Oct 28, 2024

Choose a reason for hiding this comment

tomasr8 commented Oct 28, 2024 • edited Loading

serhiy-storchaka Nov 3, 2024

Choose a reason for hiding this comment

serhiy-storchaka Nov 3, 2024

Choose a reason for hiding this comment

tomasr8 Nov 3, 2024

Choose a reason for hiding this comment

serhiy-storchaka Nov 3, 2024

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

miss-islington-app bot commented Nov 3, 2024

bedevere-app bot commented Nov 3, 2024

bedevere-app bot commented Nov 3, 2024

tomasr8 commented Aug 20, 2023 •

edited by bedevere-bot

Loading

tomasr8 commented Aug 20, 2023 •

edited

Loading

tomasr8 commented Oct 28, 2024 •

edited

Loading