[OLD VERSION DO NOT MERGE] Split tests to core/providers/task-sdk/integration/system #43965

potiuk · 2024-11-13T14:04:21Z

The tests execution was traditionally using single "breeze testing tests" command and you could select which test to run via TEST_TYPE.

However the recent move of providers and adding task_sdk necessitates splitting the tests commands into separate commands for core, providers, task_sdk, helm, integration and system.

This is done via introducing "TEST_GROUP" - which determines which group of tests is being executed, and dedicated testing command for each of the groups - with "db" and "non-db" variants where applicable.

Cleanup and small refactoring has been done to make it easier to reason about parameters passed down from the command line to docker and in-container pytest command.

Related: #42632

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

potiuk · 2024-11-13T14:30:04Z

Also cc: @ahidalgob @kosteev @pankajkoti @fdemiane @sc250072 except the reviewers that I pinged.

This is something I worked on for last few days as first stage of #42632 - it will be quite big eventually and it is still work in progress (I am still working on fixing all the tests in breeze selective checks and need to apply changes to github workflows to run the tests but wanted to give you a heads up of what I am proposing as a change.

So do not (yet) comment on subtle details - that will come, but I'd love more to get a heads-up on the whole concept that I am planning to implement for our tests, some simplifications it will bring and impact on the system tests execution (because that is something not yet in our CI and something that should be applied as a change when you will run system tests of Amazon, Google, Astronomer and Teradata people.

Few general things:

The set of testing commands we will be able to use is this:

This means that we can separate out providers/task_sdk, core tests as completely different jobs in CI (also it will make it easier to run locally.

I introduced the concept of "Group Of Tests" - previously the test types (All/Core/Other/...) in the core were mixed with Helm or System test type and it was very confusing. Now we have test group ('Core, Providers, Integration-Core, Integration-Providers, System-Core, System-Providers, Helm) - and each group has it's own "test types" defined (so "Other/Always etc. " are valid for "Core Test Group", but not valid for "Providers Test Group" - also you cannot specify "Providers*" test type for anything else than one of Providers test group.
The "options" for each of the test commands are now much more consistent and definition of groups of those options is now much more composable and logical. Example "providers-tests" set of options that are grouped accorting to test type (db/non-db, providers/core etc). For example DB operations are available for the "general" and "db" test scopes and not available for the "non-db" scope.

I am still working on cleaning up and restructuring the code a bit - but generally code for different test groups to produce the right pytest command will be separate "per group" when I finish.
the overall idea for the change is to be able to keep the same "selective check" optimizations that allow to run only subset of changes for each change and to allo to run tests in parallel on bigger machines (once we have ARC running cc: @hussein-awala ).
Running system tests after this is merged should look this way:

breeze testing system-providers-tests ./providers/tests/system/amazon/aws/tests/test_aws_auth_manager.py

(plus the usual parameters - env vars etc. configured).

Let me know what you think - high level - I will work on fixing the tests and adding missing changes and updating docs accordingly in the meantime.

potiuk · 2024-11-13T14:31:15Z

And yes - it's huge - I know, but I have not figured out how to split it.

potiuk · 2024-11-13T15:21:25Z

!!!!!!!!!! Please, Please, Pretty please - committers do not approve the workflow. This is an example for Github Ticket I opened to show that I need to approve my own workflows now. !!!!!!!
😱 😱 😱 😱 😱 😱 😱 😱 😱 😱 😱 😱 😱

vincbeck · 2024-11-13T15:45:52Z

I have not looked at the code but only at the description. I like the direction it is taking, the notion of test group makes sense. Having more granular test commands makes sense and will reduce the number of options we have today for the breeze testing tests command. As mentioned in the description, some of these options do not apply for certain tests and managing all the different combinations of these options is cumbersome. So I guess we'll gain a lot of simplification there

The tests execution was traditionally using single "breeze testing tests" command and you could select which test to run via TEST_TYPE. However the recent move of providers and adding task_sdk necessitates splitting the tests commands into separate commands for core, providers, task_sdk, helm, integration and system. This is done via introducing "TEST_GROUP" - which determines which group of tests is being executed, and dedicated testing command for each of the groups - with "db" and "non-db" variants where applicable. Cleanup and small refactoring has been done to make it easier to reason about parameters passed down from the command line to docker and in-container pytest command. Related: apache#42632

o-nikolas · 2024-11-13T17:47:06Z

Looks good so far!
Do we need to make the db/non-db part of the command (e.g. core-tests, core-db-tests, core-non-db-tests)? It feels like we should have just one group command and then an option to toggle db/non-db/both.

Other than that I like the changes!

potiuk · 2024-11-13T22:32:06Z

Do we need to make the db/non-db part of the command (e.g. core-tests, core-db-tests, core-non-db-tests)? It feels like we should have just one group command and then an option to toggle db/non-db/both.

It makes it quite a bit easier to handle. DB paralel tests are run using multi-processing and running parallel docker-composes. Where Non-DB parallel tests are run using xdist. This makes the two commands pretty distinct - they not only behave diferently but also they have various other parameters applicable on in one of those cases (mostly the DB).

Those are the options that only make senses for DB tests:

--db-reset
--backend
--no-db-cleanup
--postgres-version
--mysql-version

Thanks to that teh non-db-tests have way smaller set of parameter than db-tests.

Technically we could pass all the necessary parametters for non-db option in the "regular" (that handles both db and non-db) - parallelism, backend=none and few others, But then that makes non-db commands way simpler when we select which command to use based on parameters passed - for example:

    if [[ "${TEST_SCOPE}" == "DB" ]]; then
        set -x
        breeze testing core-db-tests
        set +x
    elif [[ "${TEST_SCOPE}" == "Non-DB" ]]; then
        set -x
        breeze testing core-non-db-tests
        set +x

I experimented with it before, and that one produced the nicest and simplest set of commands to execute without specifying too many parameters.

potiuk · 2024-11-14T11:12:31Z

I moved the change to #43979. Closing this one - seems workflows have been run eventualy BTW

boring-cyborg bot added the area:dev-tools label Nov 13, 2024

potiuk requested review from ashb, kaxil, gopidesupavan, jedcunningham, o-nikolas and ferruzzi November 13, 2024 14:04

potiuk requested review from eladkal, ephraimbuddy and jscheffl November 13, 2024 14:30

potiuk force-pushed the split-tests-to-core-providers-task-sdk branch from cceee29 to 7d1f2fb Compare November 13, 2024 16:55

potiuk changed the title ~~Split tests to core/providers/task-sdk/integration/system~~ [OLD VERSION DO NOT MERGE] Split tests to core/providers/task-sdk/integration/system Nov 14, 2024

potiuk closed this Nov 14, 2024

potiuk mentioned this pull request Nov 14, 2024

Split tests to core/providers/task-sdk/integration/system #43979

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OLD VERSION DO NOT MERGE] Split tests to core/providers/task-sdk/integration/system #43965

[OLD VERSION DO NOT MERGE] Split tests to core/providers/task-sdk/integration/system #43965

potiuk commented Nov 13, 2024

potiuk commented Nov 13, 2024

potiuk commented Nov 13, 2024

potiuk commented Nov 13, 2024

vincbeck commented Nov 13, 2024

o-nikolas commented Nov 13, 2024

potiuk commented Nov 13, 2024

potiuk commented Nov 14, 2024

[OLD VERSION DO NOT MERGE] Split tests to core/providers/task-sdk/integration/system #43965

[OLD VERSION DO NOT MERGE] Split tests to core/providers/task-sdk/integration/system #43965

Conversation

potiuk commented Nov 13, 2024

potiuk commented Nov 13, 2024

potiuk commented Nov 13, 2024

potiuk commented Nov 13, 2024

vincbeck commented Nov 13, 2024

o-nikolas commented Nov 13, 2024

potiuk commented Nov 13, 2024

potiuk commented Nov 14, 2024