Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: SPMD interface for IncrementalLinearRegression #1972

Merged
merged 14 commits into from
Sep 5, 2024

Conversation

olegkkruglov
Copy link
Contributor

@olegkkruglov olegkkruglov commented Jul 29, 2024

Description

  • Added SPMD interface for IncrementalLinearRegression
  • Changed policy saving workflow, now queue is saved to attributes instead of policy. It is necessary because finalize_fit requires spmd_policy, but partial_fit requires data_parallel_policy on oneDAL side
  • finalize_fit now uses provided queue for computations on onedal4py side.
  • Contains some content from TEST: test coverage for sklearnex SPMD ifaces #1777 for test implementation

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • The unit tests pass successfully.
  • I have run it locally and tested the changes extensively.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.

@olegkkruglov olegkkruglov added enhancement New feature or request testing Tests for sklearnex/daal4py/onedal4py & patching sklearn labels Jul 29, 2024
onedal/spmd/linear_model/incremental_linear_model.py Outdated Show resolved Hide resolved
Comment on lines 67 to 84
X, y = _convert_to_supported(policy, X, y)

if not hasattr(self, "_dtype"):
self._dtype = get_dtype(X)
self._params = self._get_onedal_params(self._dtype)

y = np.asarray(y).astype(dtype=self._dtype)
self._y_ndim_1 = y.ndim == 1

X, y = _check_X_y(X, y, dtype=[np.float64, np.float32], accept_2d_y=True)

self.n_features_in_ = _num_features(X, fallback_1d=True)
X_table, y_table = to_table(X, y)
hparams = get_hyperparameters("linear_regression", "train")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is any of this needed? Shouldn't this be covered by batch estimator call? There are no data preprocessing function calls in other spmd functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method overrides the regular one, that's why data preprocessing is necessary here as well

@samir-nasibli
Copy link
Contributor

@olegkkruglov please rebase your branch

@ethanglaser
Copy link
Contributor

/intelci: run

@ethanglaser
Copy link
Contributor

@uxlfoundation uxlfoundation deleted a comment from olegkkruglov Aug 20, 2024
@olegkkruglov
Copy link
Contributor Author

@olegkkruglov
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olegkkruglov more than one approval from the team is needed for all online algos merge.

@samir-nasibli
Copy link
Contributor

/intelci: run

@olegkkruglov
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to count finite checks in partial fit, there is a discussion of tolerances vs batch and a removal of an unnecessary variable.

Would it worth testing some of these versus the sklearnex standard linear regression (since its been battle-tested in use from almost the get-go of the repo)?

@icfaust
Copy link
Contributor

icfaust commented Sep 3, 2024

/azp run CI

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@icfaust
Copy link
Contributor

icfaust commented Sep 3, 2024

/intelci: run

@olegkkruglov
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor

Please resolve internal CI fails (both conformance and test threshold issues) before merging

@olegkkruglov
Copy link
Contributor Author

olegkkruglov commented Sep 4, 2024

sklearnex/spmd/linear_model/incremental_linear_model.py Outdated Show resolved Hide resolved
onedal/utils/validation.py Outdated Show resolved Hide resolved
"linear_model", "regression", "partial_train_result"
)

def partial_fit(self, X, y, queue=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still a little confused why this is re-implemented and cannot take the base estimator's - can you clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have spmd partial_fit on c++ side, that's why it is reimplemented here to take non-spmd backend

@ethanglaser
Copy link
Contributor

CI looks good

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My approval is contingent on addressing @ethanglaser 's comments.

@olegkkruglov olegkkruglov merged commit 45fc83d into uxlfoundation:main Sep 5, 2024
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request testing Tests for sklearnex/daal4py/onedal4py & patching sklearn
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants