Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OLD VERSION DO NOT MERGE] Split tests to core/providers/task-sdk/integration/system #43965

Closed

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 13, 2024

The tests execution was traditionally using single "breeze testing tests" command and you could select which test to run via TEST_TYPE.

However the recent move of providers and adding task_sdk necessitates splitting the tests commands into separate commands for core, providers, task_sdk, helm, integration and system.

This is done via introducing "TEST_GROUP" - which determines which group of tests is being executed, and dedicated testing command for each of the groups - with "db" and "non-db" variants where applicable.

Cleanup and small refactoring has been done to make it easier to reason about parameters passed down from the command line to docker and in-container pytest command.

Related: #42632


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Nov 13, 2024

Also cc: @ahidalgob @kosteev @pankajkoti @fdemiane @sc250072 except the reviewers that I pinged.

This is something I worked on for last few days as first stage of #42632 - it will be quite big eventually and it is still work in progress (I am still working on fixing all the tests in breeze selective checks and need to apply changes to github workflows to run the tests but wanted to give you a heads up of what I am proposing as a change.

So do not (yet) comment on subtle details - that will come, but I'd love more to get a heads-up on the whole concept that I am planning to implement for our tests, some simplifications it will bring and impact on the system tests execution (because that is something not yet in our CI and something that should be applied as a change when you will run system tests of Amazon, Google, Astronomer and Teradata people.

Few general things:

  1. The set of testing commands we will be able to use is this:

image

This means that we can separate out providers/task_sdk, core tests as completely different jobs in CI (also it will make it easier to run locally.

  1. I introduced the concept of "Group Of Tests" - previously the test types (All/Core/Other/...) in the core were mixed with Helm or System test type and it was very confusing. Now we have test group ('Core, Providers, Integration-Core, Integration-Providers, System-Core, System-Providers, Helm) - and each group has it's own "test types" defined (so "Other/Always etc. " are valid for "Core Test Group", but not valid for "Providers Test Group" - also you cannot specify "Providers*" test type for anything else than one of Providers test group.

  2. The "options" for each of the test commands are now much more consistent and definition of groups of those options is now much more composable and logical. Example "providers-tests" set of options that are grouped accorting to test type (db/non-db, providers/core etc). For example DB operations are available for the "general" and "db" test scopes and not available for the "non-db" scope.

image

  1. I am still working on cleaning up and restructuring the code a bit - but generally code for different test groups to produce the right pytest command will be separate "per group" when I finish.

  2. the overall idea for the change is to be able to keep the same "selective check" optimizations that allow to run only subset of changes for each change and to allo to run tests in parallel on bigger machines (once we have ARC running cc: @hussein-awala ).

  3. Running system tests after this is merged should look this way:

breeze testing system-providers-tests ./providers/tests/system/amazon/aws/tests/test_aws_auth_manager.py

(plus the usual parameters - env vars etc. configured).

Let me know what you think - high level - I will work on fixing the tests and adding missing changes and updating docs accordingly in the meantime.

@potiuk
Copy link
Member Author

potiuk commented Nov 13, 2024

And yes - it's huge - I know, but I have not figured out how to split it.

@potiuk
Copy link
Member Author

potiuk commented Nov 13, 2024

!!!!!!!!!! Please, Please, Pretty please - committers do not approve the workflow. This is an example for Github Ticket I opened to show that I need to approve my own workflows now. !!!!!!!
😱 😱 😱 😱 😱 😱 😱 😱 😱 😱 😱 😱 😱

@vincbeck
Copy link
Contributor

I have not looked at the code but only at the description. I like the direction it is taking, the notion of test group makes sense. Having more granular test commands makes sense and will reduce the number of options we have today for the breeze testing tests command. As mentioned in the description, some of these options do not apply for certain tests and managing all the different combinations of these options is cumbersome. So I guess we'll gain a lot of simplification there

The tests execution was traditionally using single "breeze testing
tests" command and you could select which test to run via TEST_TYPE.

However the recent move of providers and adding task_sdk necessitates
splitting the tests commands into separate commands for core, providers,
task_sdk, helm, integration and system.

This is done via introducing "TEST_GROUP" - which determines which
group of tests is being executed, and dedicated testing command for each
of the groups - with "db" and "non-db" variants where applicable.

Cleanup and small refactoring has been done to make it easier to
reason about parameters passed down from the command line to
docker and in-container pytest command.

Related: apache#42632
@potiuk potiuk force-pushed the split-tests-to-core-providers-task-sdk branch from cceee29 to 7d1f2fb Compare November 13, 2024 16:55
@o-nikolas
Copy link
Contributor

Looks good so far!
Do we need to make the db/non-db part of the command (e.g. core-tests, core-db-tests, core-non-db-tests)? It feels like we should have just one group command and then an option to toggle db/non-db/both.

Other than that I like the changes!

@potiuk
Copy link
Member Author

potiuk commented Nov 13, 2024

Do we need to make the db/non-db part of the command (e.g. core-tests, core-db-tests, core-non-db-tests)? It feels like we should have just one group command and then an option to toggle db/non-db/both.

It makes it quite a bit easier to handle. DB paralel tests are run using multi-processing and running parallel docker-composes. Where Non-DB parallel tests are run using xdist. This makes the two commands pretty distinct - they not only behave diferently but also they have various other parameters applicable on in one of those cases (mostly the DB).

Those are the options that only make senses for DB tests:

  • --db-reset
  • --backend
  • --no-db-cleanup
  • --postgres-version
  • --mysql-version

Thanks to that teh non-db-tests have way smaller set of parameter than db-tests.

Technically we could pass all the necessary parametters for non-db option in the "regular" (that handles both db and non-db) - parallelism, backend=none and few others, But then that makes non-db commands way simpler when we select which command to use based on parameters passed - for example:

    if [[ "${TEST_SCOPE}" == "DB" ]]; then
        set -x
        breeze testing core-db-tests
        set +x
    elif [[ "${TEST_SCOPE}" == "Non-DB" ]]; then
        set -x
        breeze testing core-non-db-tests
        set +x

I experimented with it before, and that one produced the nicest and simplest set of commands to execute without specifying too many parameters.

@potiuk potiuk changed the title Split tests to core/providers/task-sdk/integration/system [OLD VERSION DO NOT MERGE] Split tests to core/providers/task-sdk/integration/system Nov 14, 2024
@potiuk
Copy link
Member Author

potiuk commented Nov 14, 2024

I moved the change to #43979. Closing this one - seems workflows have been run eventualy BTW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants