Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: SPMD interface for IncrementalEmpiricalCovariance #1941

Merged
merged 26 commits into from
Sep 4, 2024

Conversation

olegkkruglov
Copy link
Contributor

@olegkkruglov olegkkruglov commented Jul 16, 2024

Description

  • Added SPMD interface for IncrementalEmpiricalCovariance
  • Added example of its usage.
  • Changed policy saving workflow, now queue is saved to attributes instead of policy. It is necessary because finalize_fit requires spmd_policy, but partial_fit requires data_parallel_policy on oneDAL side
  • finalize_fit now uses provided queue for computations on onedal4py side.
  • Contains some content from TEST: test coverage for sklearnex SPMD ifaces #1777 for test implementation
  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes, if necessary.
  • The unit tests pass successfully.
  • I have run it locally and tested the changes extensively.
  • I have resolved any merge conflicts that might occur with the base branch.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details)
  • I have added a respective label(s) to PR if I have a permission for that.

@olegkkruglov olegkkruglov force-pushed the inccov-spmd branch 5 times, most recently from 8ddd338 to 2a3fcd5 Compare July 16, 2024 17:42
onedal/covariance/covariance.cpp Show resolved Hide resolved
Comment on lines 61 to 62
self._partial_result = BaseEstimator._get_backend(
self, "covariance", None, "partial_compute_result"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about inheriting BaseEstimator in class definition instead?

Copy link
Contributor Author

@olegkkruglov olegkkruglov Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseEmpiricalCovariance is inherited from BaseEstimator. The trick here is that we inherit from BaseEstimatorSPMD which also has _get_backend. Thus, if we put self._get_backend here then SPMD backend would be called which does not contain partial_compute_result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a definition of _reset in the SPMD interface which uses a super call for locality code. That would be simpler for maintainers in the future to see why certain things are done. At a minimum it needs a comment in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also necessary to redefine partial_fit because partial_compute also does not exist in the SPMD backend. I'm not sure if the code duplication is better idea than the currently implemented one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also have to add that super call would not work there because BaseEstimator and BaseEstimatorSPMD have the same methods and if the class has both of them as parents we anyway need to specify directly which of them should be called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ask you do refactoring after this PR merged. Currently this looks like a workaround.
Create some BaseIncreamenatlEstimator, where get_backend method depending on provided method name returns required backend. This will be common for all incremental algos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create some BaseIncreamenatlEstimator, where get_backend method depending on provided method name returns required backend. This will be common for all incremental algos.

this might be a good idea, I'll think about it

if not hasattr(self, "_queue"):
self._queue = queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary? It should be handled by universal functionality, not in estimators

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for finalize_fit dispatching. It does not have data argument, so, in case if user does not provide queue explicitly then the last queue from partial_fit is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the last queue be extracted from the stored policy via the _queue property? A lot of this logic might be unnecessary. @olegkkruglov let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done like that before this PR. but now it turned out that different policies must be used in finalize_fit and partial_fit that's why stored policy from partial fit is not acceptable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's just partial_fit and reset which requires a data parallel policy, I would overload partial_fit and reset in the spmd interface, and make sure that the correct policy is taken there via a super call. A small duplication of code, but at least its clear to the developer and the user on what is going on, and its located in a place that someone looking to understand the spmd interface can see the limitations of partial_fit for spmd in the incremental algos. It would be then more straightforward for @samir-nasibli 's request for a refactor with an Incremental base class.

Copy link
Contributor

@samir-nasibli samir-nasibli Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for finalize_fit dispatching. It does not have data argument, so, in case if user does not provide queue explicitly then the last queue from partial_fit is used.

Maybe it make sense explicitly ask user provide sycl queue? Otherwise it is headache
I understand that this is based on the onedal api, but it seems the real fix should be on the onedal side. I didn't find any example spmd incremental with use of policies. The interface itself for onedal user seems inconvenient to me. It makes sense to redesign of API there first and then expose it here, to sklearnex level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalize_fit is called implicitly on sklearnex side. if we want to keep scikit-like interface (without explicit finalize call) the only option is to call finalize after every call of partial_fit. this option was rejected on arch meeting, that's why keeping queue in attributes is currently unavoidable.

@olegkkruglov
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level questions about the policy queues and the backend. Also comments would be nice.

onedal/_device_offload.py Outdated Show resolved Hide resolved
onedal/_device_offload.py Outdated Show resolved Hide resolved
Comment on lines 61 to 62
self._partial_result = BaseEstimator._get_backend(
self, "covariance", None, "partial_compute_result"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a definition of _reset in the SPMD interface which uses a super call for locality code. That would be simpler for maintainers in the future to see why certain things are done. At a minimum it needs a comment in the code.

if not hasattr(self, "_queue"):
self._queue = queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the last queue be extracted from the stored policy via the _queue property? A lot of this logic might be unnecessary. @olegkkruglov let me know.

@olegkkruglov
Copy link
Contributor Author

/intelci: run

@olegkkruglov
Copy link
Contributor Author

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overloading partial_fit and _reset in the SPMD class is more preferrable to changes in underlying classes and storing queues. It would be simpler to understand, spmd problems should stay to spmd.

if not hasattr(self, "_queue"):
self._queue = queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's just partial_fit and reset which requires a data parallel policy, I would overload partial_fit and reset in the spmd interface, and make sure that the correct policy is taken there via a super call. A small duplication of code, but at least its clear to the developer and the user on what is going on, and its located in a place that someone looking to understand the spmd interface can see the limitations of partial_fit for spmd in the incremental algos. It would be then more straightforward for @samir-nasibli 's request for a refactor with an Incremental base class.

@olegkkruglov
Copy link
Contributor Author

I think the overloading partial_fit and _reset in the SPMD class is more preferrable to changes in underlying classes and storing queues. It would be simpler to understand, spmd problems should stay to spmd.

Why is storing queue worse than storing policy? As far as I see, storing queue instead of policy is unavoidable because the policy from partial_fit can't be used in finalize_fit and it is the only thing the policy was stored for.

@olegkkruglov
Copy link
Contributor Author

/intelci: run

@olegkkruglov olegkkruglov added the enhancement New feature or request label Jul 22, 2024
@olegkkruglov
Copy link
Contributor Author

/intelci: run

1 similar comment
@olegkkruglov
Copy link
Contributor Author

/intelci: run

@olegkkruglov
Copy link
Contributor Author

Generally good to go, given the issues observed with dtypes in #1961, could you also parametrize dtype here just to see what it does to the results? (No dtype assert necessary this time)

done

@olegkkruglov
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval dependent on @samir-nasibli 's requests (docstrings at least).

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @olegkkruglov !
Now looks good to me!
Assuming green CI, please share internal CI job link.

Expecting quick follow up refactoring, based on the tickets created and for the docstrings mentioned.
I am good to go functionally with this PR as is, but the refactoring is deserved before CF.

@olegkkruglov olegkkruglov merged commit c214145 into uxlfoundation:main Sep 4, 2024
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants