-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: SPMD interface for IncrementalLinearRegression #1972
Conversation
X, y = _convert_to_supported(policy, X, y) | ||
|
||
if not hasattr(self, "_dtype"): | ||
self._dtype = get_dtype(X) | ||
self._params = self._get_onedal_params(self._dtype) | ||
|
||
y = np.asarray(y).astype(dtype=self._dtype) | ||
self._y_ndim_1 = y.ndim == 1 | ||
|
||
X, y = _check_X_y(X, y, dtype=[np.float64, np.float32], accept_2d_y=True) | ||
|
||
self.n_features_in_ = _num_features(X, fallback_1d=True) | ||
X_table, y_table = to_table(X, y) | ||
hparams = get_hyperparameters("linear_regression", "train") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is any of this needed? Shouldn't this be covered by batch estimator call? There are no data preprocessing function calls in other spmd functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method overrides the regular one, that's why data preprocessing is necessary here as well
@olegkkruglov please rebase your branch |
7015e98
to
6d5841b
Compare
/intelci: run |
/intelci: run |
805e08d
to
678da2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olegkkruglov more than one approval from the team is needed for all online algos merge.
/intelci: run |
7fda21f
to
fa1cc04
Compare
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to count finite checks in partial fit, there is a discussion of tolerances vs batch and a removal of an unnecessary variable.
Would it worth testing some of these versus the sklearnex standard linear regression (since its been battle-tested in use from almost the get-go of the repo)?
sklearnex/spmd/linear_model/tests/test_incremental_linear_spmd.py
Outdated
Show resolved
Hide resolved
sklearnex/spmd/linear_model/tests/test_incremental_linear_spmd.py
Outdated
Show resolved
Hide resolved
/azp run CI |
Azure Pipelines successfully started running 1 pipeline(s). |
/intelci: run |
387fd52
to
73cddd3
Compare
/intelci: run |
Please resolve internal CI fails (both conformance and test threshold issues) before merging |
73cddd3
to
4871f34
Compare
"linear_model", "regression", "partial_train_result" | ||
) | ||
|
||
def partial_fit(self, X, y, queue=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still a little confused why this is re-implemented and cannot take the base estimator's - can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have spmd partial_fit on c++ side, that's why it is reimplemented here to take non-spmd backend
CI looks good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My approval is contingent on addressing @ethanglaser 's comments.
f4eebdf
to
fe4175b
Compare
Description
finalize_fit
requiresspmd_policy
, butpartial_fit
requiresdata_parallel_policy
on oneDAL sidefinalize_fit
now uses provided queue for computations on onedal4py side.Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance