-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEST: test coverage for sklearnex SPMD ifaces #1777
TEST: test coverage for sklearnex SPMD ifaces #1777
Conversation
/intelci: run |
/intelci: run |
sklearnex/spmd/basic_statistics/tests/test_basic_statistics_spmd.py
Outdated
Show resolved
Hide resolved
sklearnex/spmd/basic_statistics/tests/test_basic_statistics_spmd.py
Outdated
Show resolved
Hide resolved
sklearnex/spmd/basic_statistics/tests/test_basic_statistics_spmd.py
Outdated
Show resolved
Hide resolved
…er/scikit-learn-intelex into dev/eglaser-pytest-mpi
Job with infra branch: http://intel-ci.intel.com/eefc46ad-f4dd-f157-a13b-a4bf010d0e2e |
Update job with infra branch: http://intel-ci.intel.com/ef02681e-c97c-f1ba-a9d0-a4bf010d0e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. Thank you!
Two questions:
- I don't see skips in test suits for the know issues, is it possible define defects by disabling related test cases?
- does it work when a number of processes to run the test <2?
Does it make sense to add@pytest.mark.mpi(min_size=2)
?
spmd_result = EmpiricalCovariance_SPMD().fit(local_dpt_data) | ||
batch_result = EmpiricalCovariance_Batch().fit(data) | ||
|
||
assert_allclose(spmd_result.covariance_, batch_result.covariance_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you are mixing use of numpy's assert_allclose and your own implemented _spmd_assert_allclose. Why not to use it for all places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certain attributes and results will require it and others do not. For example with linear regression, inference is done on chunks of data with spmd (whereas batch does entire data) therefore spmd version is used whereas coefficients are universal and require regular assert_allclose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it is not clear how you run this tests?
https://pytest-mpi.readthedocs.io/en/latest/usage.html#
I think make sense to add some readme files/update docs in case how to run this tests.
I see also that different dtypes are not tested here. Is this parametrizing skipped on purpose? |
Not on purpose - I will add it in. Good point. |
Currently only added to internal CI since this is where we have GPU validation - see infra PR 712. Not sure if it makes sense to document in public repo. |
There are pytest.skip( ) within the test functions - mostly because skips are dependent on parameters to function (except forest)
Infra sets up tests to run with 4 processes, but it would just run batch GPU if 1 is used, so I don't see why it would not work. But all tests are set up to skip if mpi or gpu are not present |
We are adding a testing module to the public repository. It seems to me it makes sense to add a doc with info about how to launch, including some links to mpi-pytest is given. |
Yes, I saw it before. My question raised because it is not everywhere. For example, for Kmeans I did not find the skip for an issue specified in the description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just minor comments, that could be probably addressed in the separate PRs.
Assuming green CI
True - in some cases its not handled by pytest.skip() but instead aspects of the test are commented. KMeans for instance the iter check is commented out and init check, because I want to have some validation still. If there is a good way to handle this instead with pytest.skip I am open to other ideas.
I will update SPMD docs updates task to include this - we currently do not have documentation of our spmd interfaces at all so I don't know if it could be added yet. Thanks for reviews. Still working out some CI fixes with introduction of float32 but will share job once its cleaner. |
I see some test fails on the latest CI job provided. |
Co-authored-by: olegkkruglov <[email protected]>
* Add simple spmd pytest for basic stats * add reason to skipif * blacked * isorted * import revisions * adding common spmd test functionality * add simple covariance mpi pytest * add linreg and auto data tests * first draft complete * follow-ups, TODOs, formatting * fix knn manual tests * deselect pca issues * address minor CI fails * improvements to _spmd_support.py * get_local_tensor cleanup * black * add random state to statistic data gen * manual to gold * add logreg skip * isort * oops * address some comments * relative imports for _utils_spmd * underscore prefix for test functions * add underscore prefix to variable * black * revert relative import, use sklearnex * revert unordered logic back to original * black * add dataframes support to testing * black formatting * black + minor fix * oops * skips for latest fails * address comments, move _as_numpy usage to utils * trying logreg gpu vs spmd instead of cpu * minor follow up to logreg * basicstats API upd changes, logreg thresholds * oops * oops neighbors * loosen logreg threshold * cleanup * minor restorations and formatting * add dtype parameter * dtype uniform revision * float32 threshold updates pt 1 * float32 threshold updates pt 2 * float32 threshold updates pt 3 * black * skip neighbors check for float32 * final neighbors threshold update * Update sklearnex/tests/_utils_spmd.py Co-authored-by: olegkkruglov <[email protected]> --------- Co-authored-by: olegkkruglov <[email protected]>
Description
Scope:
For each sklearnex spmd algorithm, individual manually-created tests and parametrized synthetic tests validate model attributes and prediction results against batch implementations
Known issues to address in follow-up: