ENH: SPMD interface for IncrementalLinearRegression #1972

olegkkruglov · 2024-07-29T20:24:48Z

Description

Added SPMD interface for IncrementalLinearRegression
Changed policy saving workflow, now queue is saved to attributes instead of policy. It is necessary because finalize_fit requires spmd_policy, but partial_fit requires data_parallel_policy on oneDAL side
finalize_fit now uses provided queue for computations on onedal4py side.
Contains some content from TEST: test coverage for sklearnex SPMD ifaces #1777 for test implementation

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

The unit tests pass successfully.
I have run it locally and tested the changes extensively.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.

onedal/spmd/linear_model/incremental_linear_model.py

ethanglaser · 2024-07-30T05:18:04Z

onedal/spmd/linear_model/incremental_linear_model.py

+        X, y = _convert_to_supported(policy, X, y)
+
+        if not hasattr(self, "_dtype"):
+            self._dtype = get_dtype(X)
+            self._params = self._get_onedal_params(self._dtype)
+
+        y = np.asarray(y).astype(dtype=self._dtype)
+        self._y_ndim_1 = y.ndim == 1
+
+        X, y = _check_X_y(X, y, dtype=[np.float64, np.float32], accept_2d_y=True)
+
+        self.n_features_in_ = _num_features(X, fallback_1d=True)
+        X_table, y_table = to_table(X, y)
+        hparams = get_hyperparameters("linear_regression", "train")


Why is any of this needed? Shouldn't this be covered by batch estimator call? There are no data preprocessing function calls in other spmd functionality.

this method overrides the regular one, that's why data preprocessing is necessary here as well

samir-nasibli · 2024-08-05T12:37:43Z

@olegkkruglov please rebase your branch

ethanglaser · 2024-08-20T00:52:47Z

/intelci: run

onedal/linear_model/incremental_linear_model.py

onedal/spmd/linear_model/incremental_linear_model.py

ethanglaser · 2024-08-20T18:47:24Z

https://intel-ci.intel.com/ef5f1fac-cd11-f152-9962-a4bf010d0e2e

olegkkruglov · 2024-08-21T14:13:43Z

https://intel-ci.intel.com/ef5fc74d-fbe9-f112-993d-a4bf010d0e2e

olegkkruglov · 2024-08-22T17:45:18Z

/intelci: run

samir-nasibli

@olegkkruglov more than one approval from the team is needed for all online algos merge.

samir-nasibli · 2024-08-30T06:29:06Z

/intelci: run

olegkkruglov · 2024-08-30T12:14:12Z

/intelci: run

icfaust

Need to count finite checks in partial fit, there is a discussion of tolerances vs batch and a removal of an unnecessary variable.

Would it worth testing some of these versus the sklearnex standard linear regression (since its been battle-tested in use from almost the get-go of the repo)?

onedal/spmd/linear_model/incremental_linear_model.py

sklearnex/spmd/linear_model/tests/test_incremental_linear_spmd.py

onedal/spmd/linear_model/incremental_linear_model.py

sklearnex/spmd/linear_model/tests/test_incremental_linear_spmd.py

icfaust · 2024-09-03T13:29:37Z

/azp run CI

azure-pipelines · 2024-09-03T13:29:48Z

Azure Pipelines successfully started running 1 pipeline(s).

icfaust · 2024-09-03T13:30:15Z

/intelci: run

olegkkruglov · 2024-09-03T18:07:58Z

/intelci: run

ethanglaser · 2024-09-04T05:49:58Z

Please resolve internal CI fails (both conformance and test threshold issues) before merging

olegkkruglov · 2024-09-04T13:04:19Z

https://intel-ci.intel.com/ef6acb95-bb9a-f1e1-8d5d-a4bf010d0e2e

sklearnex/spmd/linear_model/incremental_linear_model.py

onedal/utils/validation.py

ethanglaser · 2024-09-04T21:17:42Z

onedal/spmd/linear_model/incremental_linear_model.py

+            "linear_model", "regression", "partial_train_result"
+        )
+
+    def partial_fit(self, X, y, queue=None):


I am still a little confused why this is re-implemented and cannot take the base estimator's - can you clarify?

we don't have spmd partial_fit on c++ side, that's why it is reimplemented here to take non-spmd backend

onedal/linear_model/incremental_linear_model.py

sklearnex/spmd/linear_model/tests/test_incremental_linear_spmd.py

ethanglaser · 2024-09-04T21:38:37Z

CI looks good

icfaust

My approval is contingent on addressing @ethanglaser 's comments.

sklearnex/spmd/linear_model/tests/test_incremental_linear_spmd.py

olegkkruglov requested review from samir-nasibli and Alexsandruss as code owners July 29, 2024 20:24

olegkkruglov added enhancement New feature or request testing Tests for sklearnex/daal4py/onedal4py & patching sklearn labels Jul 29, 2024

olegkkruglov requested review from icfaust and ethanglaser July 29, 2024 20:25

ethanglaser reviewed Jul 30, 2024

View reviewed changes

olegkkruglov force-pushed the inclin-spmd branch from 7015e98 to 6d5841b Compare August 19, 2024 15:30

ethanglaser reviewed Aug 20, 2024

View reviewed changes

onedal/linear_model/incremental_linear_model.py Show resolved Hide resolved

ethanglaser reviewed Aug 20, 2024

View reviewed changes

onedal/spmd/linear_model/incremental_linear_model.py Show resolved Hide resolved

uxlfoundation deleted a comment from olegkkruglov Aug 20, 2024

icfaust mentioned this pull request Aug 27, 2024

ENH: SPMD interface for IncrementalEmpiricalCovariance #1941

Merged

8 tasks

olegkkruglov requested review from maria-Petrova, napetrov and bdmoore1 as code owners August 28, 2024 16:07

bdmoore1 approved these changes Aug 28, 2024

View reviewed changes

olegkkruglov force-pushed the inclin-spmd branch from 805e08d to 678da2f Compare August 28, 2024 16:45

samir-nasibli reviewed Aug 30, 2024

View reviewed changes

olegkkruglov force-pushed the inclin-spmd branch from 7fda21f to fa1cc04 Compare August 30, 2024 12:13

icfaust requested review from samir-nasibli and ethanglaser September 3, 2024 12:26

icfaust reviewed Sep 3, 2024

View reviewed changes

olegkkruglov force-pushed the inclin-spmd branch from 387fd52 to 73cddd3 Compare September 3, 2024 18:04

olegkkruglov force-pushed the inclin-spmd branch from 73cddd3 to 4871f34 Compare September 4, 2024 12:41

ethanglaser reviewed Sep 4, 2024

View reviewed changes

icfaust approved these changes Sep 5, 2024

View reviewed changes

olegkkruglov added 13 commits September 5, 2024 02:14

Add incremental distributed linear regression

bc6121f

Fix docstring

78555a5

Rename test file

4028e79

Fix tests

89209d3

Remove support_usm_ndarray

0c7bc4a

Revert accidentally pushed changes in docs

1159c0c

Rename class reference

db6af54

Update self._queue in every partial_fit call

001c6f5

Change naming for base class reference

5d499cd

Address comments

bd69d32

Change test_tolerance

94429b7

Add docstrings

8c1f5b4

Fix docstring

fe4175b

olegkkruglov force-pushed the inclin-spmd branch from f4eebdf to fe4175b Compare September 5, 2024 09:21

ethanglaser reviewed Sep 5, 2024

View reviewed changes

sklearnex/spmd/linear_model/tests/test_incremental_linear_spmd.py Outdated Show resolved Hide resolved

ethanglaser reviewed Sep 5, 2024

View reviewed changes

sklearnex/spmd/linear_model/tests/test_incremental_linear_spmd.py Outdated Show resolved Hide resolved

ethanglaser approved these changes Sep 5, 2024

View reviewed changes

Fix comments

235ad3c

olegkkruglov merged commit 45fc83d into uxlfoundation:main Sep 5, 2024
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: SPMD interface for IncrementalLinearRegression #1972

ENH: SPMD interface for IncrementalLinearRegression #1972

olegkkruglov commented Jul 29, 2024 •

edited

Loading

ethanglaser Jul 30, 2024

olegkkruglov Jul 30, 2024

samir-nasibli commented Aug 5, 2024

ethanglaser commented Aug 20, 2024

ethanglaser commented Aug 20, 2024

olegkkruglov commented Aug 21, 2024

olegkkruglov commented Aug 22, 2024

samir-nasibli left a comment

samir-nasibli commented Aug 30, 2024

olegkkruglov commented Aug 30, 2024

icfaust left a comment

icfaust commented Sep 3, 2024

azure-pipelines bot commented Sep 3, 2024

icfaust commented Sep 3, 2024

olegkkruglov commented Sep 3, 2024

ethanglaser commented Sep 4, 2024

olegkkruglov commented Sep 4, 2024 •

edited

Loading

ethanglaser Sep 4, 2024

olegkkruglov Sep 5, 2024

ethanglaser commented Sep 4, 2024

icfaust left a comment

ENH: SPMD interface for IncrementalLinearRegression #1972

ENH: SPMD interface for IncrementalLinearRegression #1972

Conversation

olegkkruglov commented Jul 29, 2024 • edited Loading

Description

ethanglaser Jul 30, 2024

Choose a reason for hiding this comment

olegkkruglov Jul 30, 2024

Choose a reason for hiding this comment

samir-nasibli commented Aug 5, 2024

ethanglaser commented Aug 20, 2024

ethanglaser commented Aug 20, 2024

olegkkruglov commented Aug 21, 2024

olegkkruglov commented Aug 22, 2024

samir-nasibli left a comment

Choose a reason for hiding this comment

samir-nasibli commented Aug 30, 2024

olegkkruglov commented Aug 30, 2024

icfaust left a comment

Choose a reason for hiding this comment

icfaust commented Sep 3, 2024

azure-pipelines bot commented Sep 3, 2024

icfaust commented Sep 3, 2024

olegkkruglov commented Sep 3, 2024

ethanglaser commented Sep 4, 2024

olegkkruglov commented Sep 4, 2024 • edited Loading

ethanglaser Sep 4, 2024

Choose a reason for hiding this comment

olegkkruglov Sep 5, 2024

Choose a reason for hiding this comment

ethanglaser commented Sep 4, 2024

icfaust left a comment

Choose a reason for hiding this comment

olegkkruglov commented Jul 29, 2024 •

edited

Loading

olegkkruglov commented Sep 4, 2024 •

edited

Loading