TEST: test coverage for sklearnex SPMD ifaces #1777

ethanglaser · 2024-03-26T17:53:19Z

Description

Scope:

Introduces initial validation of spmd algos via mpi-pytest
Identify and create follow-up tasks for issues identified in initial testing

For each sklearnex spmd algorithm, individual manually-created tests and parametrized synthetic tests validate model attributes and prediction results against batch implementations

Known issues to address in follow-up:

KMeans centers not aligned after init
Number of KMeans iterations do not align
PCA falls back if mle or n_components integer
PCA fails if n_rows_rank < n_cols < n_rows
Forest results are not aligned
LinReg fails if n_rows_rank < n_cols < n_rows
LogReg model coefficients do not align
LogReg fails if n_rows_rank < n_cols < n_rows
kNN needs all classes represented on each process initially

ethanglaser · 2024-03-26T17:53:30Z

/intelci: run

ethanglaser · 2024-03-26T17:59:12Z

/intelci: run

sklearnex/spmd/basic_statistics/tests/test_basic_statistics_spmd.py

…er/scikit-learn-intelex into dev/eglaser-pytest-mpi

ethanglaser · 2024-04-16T23:12:08Z

Job with infra branch: http://intel-ci.intel.com/eefc46ad-f4dd-f157-a13b-a4bf010d0e2e

sklearnex/spmd/decomposition/tests/test_pca_spmd.py

ethanglaser · 2024-04-24T18:26:41Z

Update job with infra branch: http://intel-ci.intel.com/ef02681e-c97c-f1ba-a9d0-a4bf010d0e2e

…er/scikit-learn-intelex into dev/eglaser-pytest-mpi

samir-nasibli

Overall looks good to me. Thank you!

Two questions:

I don't see skips in test suits for the know issues, is it possible define defects by disabling related test cases?
does it work when a number of processes to run the test <2?
Does it make sense to add @pytest.mark.mpi(min_size=2)?

samir-nasibli · 2024-07-25T06:38:49Z

sklearnex/spmd/covariance/tests/test_covariance_spmd.py

+    spmd_result = EmpiricalCovariance_SPMD().fit(local_dpt_data)
+    batch_result = EmpiricalCovariance_Batch().fit(data)
+
+    assert_allclose(spmd_result.covariance_, batch_result.covariance_)


I see that you are mixing use of numpy's assert_allclose and your own implemented _spmd_assert_allclose. Why not to use it for all places?

Certain attributes and results will require it and others do not. For example with linear regression, inference is done on chunks of data with spmd (whereas batch does entire data) therefore spmd version is used whereas coefficients are universal and require regular assert_allclose

samir-nasibli

Also it is not clear how you run this tests?
https://pytest-mpi.readthedocs.io/en/latest/usage.html#

I think make sense to add some readme files/update docs in case how to run this tests.

olegkkruglov · 2024-07-25T12:58:23Z

I see also that different dtypes are not tested here. Is this parametrizing skipped on purpose?

ethanglaser · 2024-07-25T13:38:53Z

I see also that different dtypes are not tested here. Is this parametrizing skipped on purpose?

Not on purpose - I will add it in. Good point.

ethanglaser · 2024-07-25T13:40:04Z

Also it is not clear how you run this tests? https://pytest-mpi.readthedocs.io/en/latest/usage.html#

I think make sense to add some readme files/update docs in case how to run this tests.

Currently only added to internal CI since this is where we have GPU validation - see infra PR 712. Not sure if it makes sense to document in public repo.

ethanglaser · 2024-07-25T16:23:24Z

I don't see skips in test suits for the know issues, is it possible define defects by disabling related test cases?

There are pytest.skip( ) within the test functions - mostly because skips are dependent on parameters to function (except forest)

does it work when a number of processes to run the test <2?
Does it make sense to add @pytest.mark.mpi(min_size=2)?

Infra sets up tests to run with 4 processes, but it would just run batch GPU if 1 is used, so I don't see why it would not work. But all tests are set up to skip if mpi or gpu are not present

samir-nasibli · 2024-07-26T06:13:18Z

Currently only added to internal CI since this is where we have GPU validation - see infra PR 712. Not sure if it makes sense to document in public repo.

We are adding a testing module to the public repository. It seems to me it makes sense to add a doc with info about how to launch, including some links to mpi-pytest is given.
Could be done in this PR or in the separate PR as well. We have a check box in our PR templates for adding/updatung docs.

samir-nasibli · 2024-07-26T06:26:32Z

I don't see skips in test suits for the know issues, is it possible define defects by disabling related test cases?

There are pytest.skip( ) within the test functions - mostly because skips are dependent on parameters to function (except forest)

Yes, I saw it before. My question raised because it is not everywhere. For example, for Kmeans I did not find the skip for an issue specified in the description.

samir-nasibli

Looks good to me. Just minor comments, that could be probably addressed in the separate PRs.
Assuming green CI

ethanglaser · 2024-07-26T06:42:00Z

Yes, I saw it before. My question raised because it is not everywhere. For example, for Kmeans I did not find the skip for an issue specified in the description.

True - in some cases its not handled by pytest.skip() but instead aspects of the test are commented. KMeans for instance the iter check is commented out and init check, because I want to have some validation still. If there is a good way to handle this instead with pytest.skip I am open to other ideas.

We are adding a testing module to the public repository. It seems to me it makes sense to add a doc with info about how to launch, including some links to mpi-pytest is given. Could be done in this PR or in the separate PR as well. We have a check box in our PR templates for adding/updatung docs.

I will update SPMD docs updates task to include this - we currently do not have documentation of our spmd interfaces at all so I don't know if it could be added yet.

Thanks for reviews. Still working out some CI fixes with introduction of float32 but will share job once its cleaner.

samir-nasibli · 2024-07-26T06:47:12Z

I see some test fails on the latest CI job provided.
I will not dismiss my approve, but please attach the green CI link before the merge. Feel free to ask review again if required.

sklearnex/tests/_utils_spmd.py

Co-authored-by: olegkkruglov <[email protected]>

* Add simple spmd pytest for basic stats * add reason to skipif * blacked * isorted * import revisions * adding common spmd test functionality * add simple covariance mpi pytest * add linreg and auto data tests * first draft complete * follow-ups, TODOs, formatting * fix knn manual tests * deselect pca issues * address minor CI fails * improvements to _spmd_support.py * get_local_tensor cleanup * black * add random state to statistic data gen * manual to gold * add logreg skip * isort * oops * address some comments * relative imports for _utils_spmd * underscore prefix for test functions * add underscore prefix to variable * black * revert relative import, use sklearnex * revert unordered logic back to original * black * add dataframes support to testing * black formatting * black + minor fix * oops * skips for latest fails * address comments, move _as_numpy usage to utils * trying logreg gpu vs spmd instead of cpu * minor follow up to logreg * basicstats API upd changes, logreg thresholds * oops * oops neighbors * loosen logreg threshold * cleanup * minor restorations and formatting * add dtype parameter * dtype uniform revision * float32 threshold updates pt 1 * float32 threshold updates pt 2 * float32 threshold updates pt 3 * black * skip neighbors check for float32 * final neighbors threshold update * Update sklearnex/tests/_utils_spmd.py Co-authored-by: olegkkruglov <[email protected]> --------- Co-authored-by: olegkkruglov <[email protected]>

ethanglaser added 2 commits March 26, 2024 10:26

Add simple spmd pytest for basic stats

b9b5d31

add reason to skipif

5f1f76e

ethanglaser added 2 commits March 26, 2024 10:56

blacked

9beabc3

isorted

97e0c35

samir-nasibli reviewed Mar 26, 2024

View reviewed changes

sklearnex/spmd/basic_statistics/tests/test_basic_statistics_spmd.py Outdated Show resolved Hide resolved

import revisions

c9b2ab5

KulikovNikita reviewed Mar 28, 2024

View reviewed changes

sklearnex/spmd/basic_statistics/tests/test_basic_statistics_spmd.py Outdated Show resolved Hide resolved

sklearnex/spmd/basic_statistics/tests/test_basic_statistics_spmd.py Outdated Show resolved Hide resolved

samir-nasibli changed the title ~~CI: Add mpi pytest validation~~ TST: Add mpi pytest validation Apr 3, 2024

ethanglaser and others added 8 commits April 5, 2024 17:07

adding common spmd test functionality

eb5e0d3

add simple covariance mpi pytest

b378b6a

add linreg and auto data tests

504c7df

Merge branch 'intel:main' into dev/eglaser-pytest-mpi

ac5c599

Merge remote-tracking branch 'origin/main' into dev/eglaser-pytest-mpi

7c769e4

Merge branch 'dev/eglaser-pytest-mpi' of https://github.com/ethanglas…

6cd09c1

…er/scikit-learn-intelex into dev/eglaser-pytest-mpi

Merge branch 'intel:main' into dev/eglaser-pytest-mpi

ee6c598

first draft complete

0bacc8a

ethanglaser changed the title ~~TST: Add mpi pytest validation~~ TEST: Add mpi pytest validation Apr 17, 2024

md-shafiul-alam reviewed Apr 17, 2024

View reviewed changes

sklearnex/spmd/decomposition/tests/test_pca_spmd.py Show resolved Hide resolved

ethanglaser mentioned this pull request Apr 24, 2024

FIX: set p for manhattan/euclidean metric #1811

Merged

ethanglaser and others added 2 commits April 24, 2024 11:01

follow-ups, TODOs, formatting

a32592c

Merge branch 'intel:main' into dev/eglaser-pytest-mpi

85a24bf

ethanglaser and others added 3 commits April 24, 2024 14:18

fix knn manual tests

1619428

Merge branch 'dev/eglaser-pytest-mpi' of https://github.com/ethanglas…

93ef150

…er/scikit-learn-intelex into dev/eglaser-pytest-mpi

Merge branch 'intel:main' into dev/eglaser-pytest-mpi

5e7a277

ethanglaser requested review from icfaust and samir-nasibli May 6, 2024 17:01

samir-nasibli reviewed Jul 25, 2024

View reviewed changes

minor restorations and formatting

d20a889

add dtype parameter

0bbed0c

olegkkruglov mentioned this pull request Jul 25, 2024

ENH: SPMD interface for IncrementalBasicStatistics #1961

Merged

8 tasks

ethanglaser added 5 commits July 25, 2024 15:19

dtype uniform revision

f79f45f

float32 threshold updates pt 1

a14a15a

float32 threshold updates pt 2

602d0f3

float32 threshold updates pt 3

219b905

black

70dd21a

samir-nasibli approved these changes Jul 26, 2024

View reviewed changes

skip neighbors check for float32

30c88be

samir-nasibli mentioned this pull request Jul 26, 2024

TEST: using get_dataframes_and_queues instead of get_queues in onedal4py testing #1909

Open

8 tasks

final neighbors threshold update

f70f46a

olegkkruglov reviewed Jul 29, 2024

View reviewed changes

sklearnex/tests/_utils_spmd.py Outdated Show resolved Hide resolved

olegkkruglov mentioned this pull request Jul 29, 2024

ENH: SPMD interface for IncrementalLinearRegression #1972

Merged

10 tasks

Update sklearnex/tests/_utils_spmd.py

3b6b085

Co-authored-by: olegkkruglov <[email protected]>

olegkkruglov mentioned this pull request Jul 31, 2024

ENH: SPMD interface for IncrementalPCA #1979

Merged

10 tasks

ethanglaser merged commit 1e645b8 into uxlfoundation:main Aug 1, 2024
17 checks passed

ethanglaser mentioned this pull request Aug 6, 2024

CI: mpi-pytest follow-up deselections #1989

Merged

10 tasks

icfaust mentioned this pull request Sep 11, 2024

[fix] register pytest.mark.mpi to solve PytestUnknownMarkWarning #2039

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEST: test coverage for sklearnex SPMD ifaces #1777

TEST: test coverage for sklearnex SPMD ifaces #1777

ethanglaser commented Mar 26, 2024 •

edited

Loading

ethanglaser commented Mar 26, 2024

ethanglaser commented Mar 26, 2024

ethanglaser commented Apr 16, 2024

ethanglaser commented Apr 24, 2024

samir-nasibli left a comment

samir-nasibli Jul 25, 2024

ethanglaser Jul 25, 2024

samir-nasibli left a comment •

edited

Loading

olegkkruglov commented Jul 25, 2024

ethanglaser commented Jul 25, 2024

ethanglaser commented Jul 25, 2024

ethanglaser commented Jul 25, 2024

samir-nasibli commented Jul 26, 2024

samir-nasibli commented Jul 26, 2024

samir-nasibli left a comment •

edited

Loading

ethanglaser commented Jul 26, 2024

samir-nasibli commented Jul 26, 2024

TEST: test coverage for sklearnex SPMD ifaces #1777

TEST: test coverage for sklearnex SPMD ifaces #1777

Conversation

ethanglaser commented Mar 26, 2024 • edited Loading

Description

ethanglaser commented Mar 26, 2024

ethanglaser commented Mar 26, 2024

ethanglaser commented Apr 16, 2024

ethanglaser commented Apr 24, 2024

samir-nasibli left a comment

Choose a reason for hiding this comment

samir-nasibli Jul 25, 2024

Choose a reason for hiding this comment

ethanglaser Jul 25, 2024

Choose a reason for hiding this comment

samir-nasibli left a comment • edited Loading

Choose a reason for hiding this comment

olegkkruglov commented Jul 25, 2024

ethanglaser commented Jul 25, 2024

ethanglaser commented Jul 25, 2024

ethanglaser commented Jul 25, 2024

samir-nasibli commented Jul 26, 2024

samir-nasibli commented Jul 26, 2024

samir-nasibli left a comment • edited Loading

Choose a reason for hiding this comment

ethanglaser commented Jul 26, 2024

samir-nasibli commented Jul 26, 2024

ethanglaser commented Mar 26, 2024 •

edited

Loading

samir-nasibli left a comment •

edited

Loading

samir-nasibli left a comment •

edited

Loading