Implement baselines as a fixture and with simple rebase support #1732

uartie · 2025-01-29T19:51:04Z

What does this PR do?

Implement a Baseline fixture to manage test case baselines via reference files instead of directly in the test code.

This allows for easier reference rebasing by simply passing the custom pytest option "--rebase" on the command-line to update the reference with the current test result.

It will also enable us to remove "baseline" from the test case signatures/names so they don't generate new signatures every time the baseline is updated or due to floating point precision differences from run-to-run.

A static/fixed test case signature ensures we can track the historical results of the test case in various tools/CI.

This is the first pull-request of multiple series to refactor how baselines are managed. This PR is meant to specifically address the test cases that used "baseline" in the test case signature, first. Future series will focus on moving the hard-coded baseline constants that exist in code in other test files.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

jiminha · 2025-01-30T19:03:48Z

@uartie This looks great! I like that the baseline is separated from the test script itself, and the --rebase option seems really convenient for updating the baseline. I noticed that test_examples.py already uses baselines/*.json. Would your code be able to walkthrough those as well?

jiminha · 2025-01-30T19:07:09Z

@regisss We are working on upgrading our test framework to enable further automation. This is one of the first one, and more changes will come. Could you review this?

uartie · 2025-01-30T20:53:02Z

@uartie This looks great! I like that the baseline is separated from the test script itself, and the --rebase option seems really convenient for updating the baseline. I noticed that test_examples.py already uses baselines/*.json. Would your code be able to walkthrough those as well?

The test_examples.py baseline json files are a combination of test configuration parameters and test result references. I will explore ways to separate and integrate those with the new baseline fixture in a future patch series.

The immediate goal, in this series, is to fix test case signature/name variability via the new baseline fixture support. This will ensure we can manage separate test case manifest files, externally, easier.

libinta · 2025-02-05T00:43:51Z

@uartie can you provide the test log with this change?

Fixture parameters in test cases are not emitted in the test signature (i.e. nodeid). Define the token as a fixture so that it does not generate an arbitrary representation in the test signature. Signed-off-by: U. Artie Eoff <[email protected]>

Implement a Baseline fixture to manage test case baselines via reference files instead of directly in the test code. This allows for easier reference rebasing by simply passing the custom pytest option "--rebase" on the command-line to replace the reference with the current test result. It will also enable us to remove "baseline" from the test case signatures (i.e. nodeid) so they don't generate new signatures every time the baseline is updated or due to floating point precision differences from run-to-run. A static/fixed test case signature ensures we can track the historical results of the test case in various tools/CI. Signed-off-by: U. Artie Eoff <[email protected]>

Use the new baseline fixture to validate test results. Signed-off-by: U. Artie Eoff <[email protected]>

Use the new baseline fixture to validate test results. Also, remove dead code (e.g. token not used). Signed-off-by: U. Artie Eoff <[email protected]>

Use the new baseline fixture to validate test results. Signed-off-by: U. Artie Eoff <[email protected]>

uartie · 2025-02-06T23:55:32Z

@uartie can you provide the test log with this change?

@libinta The test results were the same with and without this PR.

+ python -m pytest tests/test_fsdp_examples.py -v -s --token=***
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
collected 2 items
=================== 1 failed, 1 passed in 586.73s (0:09:46) ====================

+ python -m pytest tests/test_text_generation_example.py tests/test_encoder_decoder.py -v -s --token=***
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
collected 66 items
================== 8 failed, 58 passed in 26648.24s (7:24:08) ==================

+ python -m pytest tests/test_image_to_text_example.py -v -s --token=***
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
collected 12 items
================== 2 failed, 10 passed in 2001.76s (0:33:21) ===================

+ python -m pytest tests/test_openclip_vqa.py -v -s
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
collected 2 items
======================== 2 passed in 200.33s (0:03:20) =========================

+ python -m pytest tests/test_sentence_transformers.py
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0
collected 13 items
================== 13 passed, 1 warning in 180.71s (0:03:00) ===================

+ python -m pytest tests/test_pipeline.py -v -s --token=***
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
collected 5 items
=================== 5 passed, 1 warning in 164.29s (0:02:44) ===================

HuggingFaceDocBuilderDev · 2025-02-07T20:20:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

Awesome PR 🚀
LGTM.

I left one comment but that doesn't prevent from merging the PR now.

tests/test_pipeline.py

uartie requested a review from regisss as a code owner January 29, 2025 19:51

uartie force-pushed the baseline-fixture branch 3 times, most recently from 2f7d969 to ab608b7 Compare January 30, 2025 16:28

uartie force-pushed the baseline-fixture branch 2 times, most recently from fa89345 to 26f194b Compare February 3, 2025 22:17

jiminha requested review from libinta and jiminha February 4, 2025 21:42

jiminha approved these changes Feb 5, 2025

View reviewed changes

libinta added the transformers_future label Feb 5, 2025

uartie force-pushed the baseline-fixture branch from 21f21a4 to 26cf0d3 Compare February 5, 2025 14:59

uartie added 2 commits February 5, 2025 17:37

Define token as a fixture

231d4ab

Fixture parameters in test cases are not emitted in the test signature (i.e. nodeid). Define the token as a fixture so that it does not generate an arbitrary representation in the test signature. Signed-off-by: U. Artie Eoff <[email protected]>

uartie force-pushed the baseline-fixture branch from 26cf0d3 to 3399119 Compare February 5, 2025 22:38

uartie added 8 commits February 6, 2025 09:27

test_text_generation_example: use baseline fixture

7653e92

Use the new baseline fixture to validate test results. Signed-off-by: U. Artie Eoff <[email protected]>

test_encoder_decoder: use baseline fixture

8e124fc

Use the new baseline fixture to validate test results. Also, remove dead code (e.g. token not used). Signed-off-by: U. Artie Eoff <[email protected]>

test_fp8_examples: use baseline fixture

b907b1d

Use the new baseline fixture to validate test results. Signed-off-by: U. Artie Eoff <[email protected]>

test_fsdp_examples: use baseline fixture

81acda5

Use the new baseline fixture to validate test results. Signed-off-by: U. Artie Eoff <[email protected]>

test_image_to_text_example: use baseline fixture

ffc3cb5

Use the new baseline fixture to validate test results. Signed-off-by: U. Artie Eoff <[email protected]>

test_openclip_vqa: use baseline fixture

decea4c

Use the new baseline fixture to validate test results. Signed-off-by: U. Artie Eoff <[email protected]>

test_sentence_transformers: use baseline fixture

3c9a044

Use the new baseline fixture to validate test results. Signed-off-by: U. Artie Eoff <[email protected]>

test_pipeline: use baseline fixture

c24b8da

Use the new baseline fixture to validate test results. Signed-off-by: U. Artie Eoff <[email protected]>

uartie force-pushed the baseline-fixture branch from 3399119 to c24b8da Compare February 6, 2025 14:28

libinta added the run-test Run CI for PRs from external contributors label Feb 6, 2025

regisss approved these changes Feb 7, 2025

View reviewed changes

tests/test_pipeline.py Show resolved Hide resolved

regisss merged commit 9e882f2 into huggingface:main Feb 7, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement baselines as a fixture and with simple rebase support #1732

Implement baselines as a fixture and with simple rebase support #1732

uartie commented Jan 29, 2025 •

edited

Loading

jiminha commented Jan 30, 2025

jiminha commented Jan 30, 2025

uartie commented Jan 30, 2025

libinta commented Feb 5, 2025

uartie commented Feb 6, 2025

HuggingFaceDocBuilderDev commented Feb 7, 2025

regisss left a comment

Implement baselines as a fixture and with simple rebase support #1732

Implement baselines as a fixture and with simple rebase support #1732

Conversation

uartie commented Jan 29, 2025 • edited Loading

What does this PR do?

Before submitting

jiminha commented Jan 30, 2025

jiminha commented Jan 30, 2025

uartie commented Jan 30, 2025

libinta commented Feb 5, 2025

uartie commented Feb 6, 2025

HuggingFaceDocBuilderDev commented Feb 7, 2025

regisss left a comment

Choose a reason for hiding this comment

uartie commented Jan 29, 2025 •

edited

Loading