ENH: SPMD interface for IncrementalEmpiricalCovariance #1941

olegkkruglov · 2024-07-16T12:24:04Z

Description

Added SPMD interface for IncrementalEmpiricalCovariance
Added example of its usage.
Changed policy saving workflow, now queue is saved to attributes instead of policy. It is necessary because finalize_fit requires spmd_policy, but partial_fit requires data_parallel_policy on oneDAL side
finalize_fit now uses provided queue for computations on onedal4py side.
Contains some content from TEST: test coverage for sklearnex SPMD ifaces #1777 for test implementation

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes, if necessary.
The unit tests pass successfully.
I have run it locally and tested the changes extensively.
I have resolved any merge conflicts that might occur with the base branch.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details)
I have added a respective label(s) to PR if I have a permission for that.

onedal/covariance/covariance.cpp

sklearnex/covariance/incremental_covariance.py

ethanglaser · 2024-07-16T22:32:47Z

onedal/covariance/incremental_covariance.py

+        self._partial_result = BaseEstimator._get_backend(
+            self, "covariance", None, "partial_compute_result"


how about inheriting BaseEstimator in class definition instead?

BaseEmpiricalCovariance is inherited from BaseEstimator. The trick here is that we inherit from BaseEstimatorSPMD which also has _get_backend. Thus, if we put self._get_backend here then SPMD backend would be called which does not contain partial_compute_result.

I'd prefer a definition of _reset in the SPMD interface which uses a super call for locality code. That would be simpler for maintainers in the future to see why certain things are done. At a minimum it needs a comment in the code.

It is also necessary to redefine partial_fit because partial_compute also does not exist in the SPMD backend. I'm not sure if the code duplication is better idea than the currently implemented one

and also have to add that super call would not work there because BaseEstimator and BaseEstimatorSPMD have the same methods and if the class has both of them as parents we anyway need to specify directly which of them should be called.

I would ask you do refactoring after this PR merged. Currently this looks like a workaround.
Create some BaseIncreamenatlEstimator, where get_backend method depending on provided method name returns required backend. This will be common for all incremental algos.

Create some BaseIncreamenatlEstimator, where get_backend method depending on provided method name returns required backend. This will be common for all incremental algos.

this might be a good idea, I'll think about it

ethanglaser · 2024-07-16T22:34:25Z

onedal/covariance/incremental_covariance.py

+        if not hasattr(self, "_queue"):
+            self._queue = queue


why is this necessary? It should be handled by universal functionality, not in estimators

It is for finalize_fit dispatching. It does not have data argument, so, in case if user does not provide queue explicitly then the last queue from partial_fit is used.

Can't the last queue be extracted from the stored policy via the _queue property? A lot of this logic might be unnecessary. @olegkkruglov let me know.

It was done like that before this PR. but now it turned out that different policies must be used in finalize_fit and partial_fit that's why stored policy from partial fit is not acceptable

Since it's just partial_fit and reset which requires a data parallel policy, I would overload partial_fit and reset in the spmd interface, and make sure that the correct policy is taken there via a super call. A small duplication of code, but at least its clear to the developer and the user on what is going on, and its located in a place that someone looking to understand the spmd interface can see the limitations of partial_fit for spmd in the incremental algos. It would be then more straightforward for @samir-nasibli 's request for a refactor with an Incremental base class.

It is for finalize_fit dispatching. It does not have data argument, so, in case if user does not provide queue explicitly then the last queue from partial_fit is used.

Maybe it make sense explicitly ask user provide sycl queue? Otherwise it is headache
I understand that this is based on the onedal api, but it seems the real fix should be on the onedal side. I didn't find any example spmd incremental with use of policies. The interface itself for onedal user seems inconvenient to me. It makes sense to redesign of API there first and then expose it here, to sklearnex level.

finalize_fit is called implicitly on sklearnex side. if we want to keep scikit-like interface (without explicit finalize call) the only option is to call finalize after every call of partial_fit. this option was rejected on arch meeting, that's why keeping queue in attributes is currently unavoidable.

examples/sklearnex/incremental_covariance_spmd.py

olegkkruglov · 2024-07-17T00:35:53Z

/intelci: run

icfaust

Some high level questions about the policy queues and the backend. Also comments would be nice.

onedal/_device_offload.py

icfaust · 2024-07-17T09:09:25Z

onedal/covariance/incremental_covariance.py

+        self._partial_result = BaseEstimator._get_backend(
+            self, "covariance", None, "partial_compute_result"


I'd prefer a definition of _reset in the SPMD interface which uses a super call for locality code. That would be simpler for maintainers in the future to see why certain things are done. At a minimum it needs a comment in the code.

icfaust · 2024-07-17T09:16:36Z

onedal/covariance/incremental_covariance.py

+        if not hasattr(self, "_queue"):
+            self._queue = queue


Can't the last queue be extracted from the stored policy via the _queue property? A lot of this logic might be unnecessary. @olegkkruglov let me know.

olegkkruglov · 2024-07-17T09:56:26Z

/intelci: run

olegkkruglov · 2024-07-17T11:10:26Z

https://ecmd.jf.intel.com/commander/link/jobDetails/jobs/ef4423ff-9ba8-f1d0-abf8-a4bf010d0e2e

icfaust

I think the overloading partial_fit and _reset in the SPMD class is more preferrable to changes in underlying classes and storing queues. It would be simpler to understand, spmd problems should stay to spmd.

icfaust · 2024-07-18T08:43:43Z

onedal/covariance/incremental_covariance.py

+        if not hasattr(self, "_queue"):
+            self._queue = queue


Since it's just partial_fit and reset which requires a data parallel policy, I would overload partial_fit and reset in the spmd interface, and make sure that the correct policy is taken there via a super call. A small duplication of code, but at least its clear to the developer and the user on what is going on, and its located in a place that someone looking to understand the spmd interface can see the limitations of partial_fit for spmd in the incremental algos. It would be then more straightforward for @samir-nasibli 's request for a refactor with an Incremental base class.

olegkkruglov · 2024-07-18T09:21:20Z

I think the overloading partial_fit and _reset in the SPMD class is more preferrable to changes in underlying classes and storing queues. It would be simpler to understand, spmd problems should stay to spmd.

Why is storing queue worse than storing policy? As far as I see, storing queue instead of policy is unavoidable because the policy from partial_fit can't be used in finalize_fit and it is the only thing the policy was stored for.

olegkkruglov · 2024-07-20T18:48:55Z

/intelci: run

examples/sklearnex/incremental_covariance_spmd.py

sklearnex/spmd/covariance/incremental_covariance.py

sklearnex/covariance/incremental_covariance.py

onedal/spmd/covariance/incremental_covariance.py

olegkkruglov · 2024-07-23T11:21:45Z

/intelci: run

olegkkruglov · 2024-07-23T12:58:10Z

/intelci: run

olegkkruglov · 2024-09-03T12:59:04Z

Generally good to go, given the issues observed with dtypes in #1961, could you also parametrize dtype here just to see what it does to the results? (No dtype assert necessary this time)

done

olegkkruglov · 2024-09-03T12:59:14Z

/intelci: run

sklearnex/spmd/covariance/incremental_covariance.py

examples/sklearnex/incremental_covariance_spmd.py

icfaust

Approval dependent on @samir-nasibli 's requests (docstrings at least).

samir-nasibli

Thank you @olegkkruglov !
Now looks good to me!
Assuming green CI, please share internal CI job link.

Expecting quick follow up refactoring, based on the tickets created and for the docstrings mentioned.
I am good to go functionally with this PR as is, but the refactoring is deserved before CF.

olegkkruglov requested review from samir-nasibli and Alexsandruss as code owners July 16, 2024 12:24

olegkkruglov requested review from icfaust and ethanglaser July 16, 2024 12:24

olegkkruglov force-pushed the inccov-spmd branch 5 times, most recently from 8ddd338 to 2a3fcd5 Compare July 16, 2024 17:42

ethanglaser reviewed Jul 16, 2024

View reviewed changes

icfaust reviewed Jul 17, 2024

View reviewed changes

olegkkruglov force-pushed the inccov-spmd branch from 5544ed6 to 3f643f9 Compare July 17, 2024 09:44

icfaust reviewed Jul 18, 2024

View reviewed changes

olegkkruglov added the enhancement New feature or request label Jul 22, 2024

olegkkruglov requested review from icfaust and ethanglaser July 22, 2024 12:07

ethanglaser reviewed Jul 23, 2024

View reviewed changes

examples/sklearnex/incremental_covariance_spmd.py Show resolved Hide resolved

ethanglaser reviewed Jul 23, 2024

View reviewed changes

sklearnex/spmd/covariance/incremental_covariance.py Outdated Show resolved Hide resolved

ethanglaser reviewed Jul 23, 2024

View reviewed changes

sklearnex/covariance/incremental_covariance.py Show resolved Hide resolved

ethanglaser reviewed Jul 23, 2024

View reviewed changes

onedal/spmd/covariance/incremental_covariance.py Outdated Show resolved Hide resolved

ethanglaser reviewed Jul 23, 2024

View reviewed changes

onedal/spmd/covariance/incremental_covariance.py Show resolved Hide resolved

olegkkruglov force-pushed the inccov-spmd branch from fa06cc2 to 714a932 Compare July 23, 2024 11:17

olegkkruglov mentioned this pull request Jul 29, 2024

ENH: Adding IncrementalRidge support into sklearnex #1957

Merged

8 tasks

olegkkruglov added 21 commits September 3, 2024 05:35

Rename classes

48d48b9

Add comments to example

6f30335

Fix lint

7916936

Add test skip for uncovered issue on C++ side

ccb9397

Add spmd tests

9e83f55

Fix lint

a42ea16

Fix is_gpu check

62f5fe4

Fix typo in partial_fit

8a1f44a

Fix is_gpu check

f6cdaa2

Rename class reference

73d00d6

Fix test

861ece3

Change version for test skip

a2d9f7b

Increase dataset size in test

9525fd7

Address comments

648eb67

Remove support_usm_ndarray

8b2911a

Revert accidentally pushed changes in docs

d426b7d

Rename class reference

e61861b

Update self._queue in every partial_fit call

125da7c

Fix skip message in test

3384c6d

Change naming for base class reference

7625457

Address comments

e5458b3

olegkkruglov force-pushed the inccov-spmd branch from a7e9085 to e5458b3 Compare September 3, 2024 12:58

samir-nasibli reviewed Sep 3, 2024

View reviewed changes

sklearnex/spmd/covariance/incremental_covariance.py Show resolved Hide resolved

samir-nasibli reviewed Sep 3, 2024

View reviewed changes

examples/sklearnex/incremental_covariance_spmd.py Show resolved Hide resolved

icfaust approved these changes Sep 4, 2024

View reviewed changes

samir-nasibli approved these changes Sep 4, 2024

View reviewed changes

Add docstring

3018e9c

olegkkruglov merged commit c214145 into uxlfoundation:main Sep 4, 2024
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: SPMD interface for IncrementalEmpiricalCovariance #1941

ENH: SPMD interface for IncrementalEmpiricalCovariance #1941

olegkkruglov commented Jul 16, 2024 •

edited

Loading

ethanglaser Jul 16, 2024

olegkkruglov Jul 16, 2024 •

edited

Loading

icfaust Jul 17, 2024

olegkkruglov Jul 17, 2024

olegkkruglov Jul 17, 2024

samir-nasibli Jul 17, 2024

olegkkruglov Jul 17, 2024

ethanglaser Jul 16, 2024

olegkkruglov Jul 16, 2024

icfaust Jul 17, 2024

olegkkruglov Jul 17, 2024

icfaust Jul 18, 2024

samir-nasibli Jul 25, 2024 •

edited

Loading

olegkkruglov Jul 25, 2024

olegkkruglov commented Jul 17, 2024

icfaust left a comment

icfaust Jul 17, 2024

icfaust Jul 17, 2024

olegkkruglov commented Jul 17, 2024

olegkkruglov commented Jul 17, 2024

icfaust left a comment

icfaust Jul 18, 2024

olegkkruglov commented Jul 18, 2024

olegkkruglov commented Jul 20, 2024

olegkkruglov commented Jul 23, 2024

olegkkruglov commented Jul 23, 2024

olegkkruglov commented Sep 3, 2024

olegkkruglov commented Sep 3, 2024

icfaust left a comment

samir-nasibli left a comment

		self._partial_result = BaseEstimator._get_backend(
		self, "covariance", None, "partial_compute_result"

ENH: SPMD interface for IncrementalEmpiricalCovariance #1941

ENH: SPMD interface for IncrementalEmpiricalCovariance #1941

Conversation

olegkkruglov commented Jul 16, 2024 • edited Loading

Description

Choose a reason for hiding this comment

olegkkruglov Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samir-nasibli Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olegkkruglov commented Jul 17, 2024

icfaust left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olegkkruglov commented Jul 17, 2024

olegkkruglov commented Jul 17, 2024

icfaust left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olegkkruglov commented Jul 18, 2024

olegkkruglov commented Jul 20, 2024

olegkkruglov commented Jul 23, 2024

olegkkruglov commented Jul 23, 2024

olegkkruglov commented Sep 3, 2024

olegkkruglov commented Sep 3, 2024

icfaust left a comment

Choose a reason for hiding this comment

samir-nasibli left a comment

Choose a reason for hiding this comment

olegkkruglov commented Jul 16, 2024 •

edited

Loading

olegkkruglov Jul 16, 2024 •

edited

Loading

samir-nasibli Jul 25, 2024 •

edited

Loading