Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KMeans out of preview namespace #1634

Closed

Conversation

md-shafiul-alam
Copy link
Contributor

@md-shafiul-alam md-shafiul-alam commented Jan 4, 2024

Moved the changes to #1770

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partially reviewed.

@md-shafiul-alam
Copy link
Contributor Author

/intelci: run

@md-shafiul-alam md-shafiul-alam marked this pull request as ready for review January 18, 2024 13:05
sklearnex/cluster/tests/test_kmeans.py Outdated Show resolved Hide resolved
sklearnex/cluster/_common.py Outdated Show resolved Hide resolved
deselected_tests.yaml Outdated Show resolved Hide resolved
daal4py/sklearn/cluster/_k_means_0_23.py Show resolved Hide resolved
onedal/cluster/kmeans.py Show resolved Hide resolved
onedal/cluster/kmeans.py Show resolved Hide resolved
sklearnex/cluster/k_means.py Show resolved Hide resolved
Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For n_jobs issue, fix could be addressed after #1622 merge

daal4py/sklearn/cluster/_k_means_0_23.py Show resolved Hide resolved
Comment on lines +35 to +39
# copied from preview, clusters centers for "full" method are different from "elkan", needs investigation
- cluster/tests/test_k_means.py::test_kmeans_elkan_results
- cluster/tests/test_k_means.py::test_unit_weights_vs_no_weights[KMeans-dense] <1.2
- cluster/tests/test_k_means.py::test_unit_weights_vs_no_weights[42-KMeans-dense] >=1.2
- cluster/tests/test_k_means.py::test_predict_sample_weight_deprecation_warning[KMeans] >=1.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be discussed whether they are a blocker for moving out from preview

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure they are still failing?

onedal/cluster/kmeans.py Outdated Show resolved Hide resolved
onedal/cluster/kmeans.py Outdated Show resolved Hide resolved
onedal/cluster/kmeans.py Show resolved Hide resolved
@icfaust
Copy link
Contributor

icfaust commented Feb 20, 2024

Before we merge this, I want to merge master into this branch and rerun CI, not just resolve conflicts. I will review this once it is brought up to date. I've been monkeying around with sklearnex CI a fair bit since the last commit, and want to see how it performs.

@icfaust
Copy link
Contributor

icfaust commented Feb 21, 2024

/intelci: run

@icfaust
Copy link
Contributor

icfaust commented Feb 21, 2024

First task will get a green public CI, second step will be to comment out all kmeans-related deselected tests (including GPU) and see what we can reactivate

Note, I will delay this because there are some changes which need to be made first, and this will be useless at the moment.

Comment on lines -356 to -363
# New failing sklearn1.4.1 tests for kmeans associated with incorrect n_iter_ values in daal4py
- cluster/tests/test_k_means.py::test_relocating_with_duplicates[lloyd-dense] >=1.4
- cluster/tests/test_k_means.py::test_relocating_with_duplicates[lloyd-sparse_matrix] >=1.4
- cluster/tests/test_k_means.py::test_relocating_with_duplicates[lloyd-sparse_array] >=1.4
- cluster/tests/test_k_means.py::test_relocating_with_duplicates[elkan-dense] >=1.4
- cluster/tests/test_k_means.py::test_relocating_with_duplicates[elkan-sparse_matrix] >=1.4
- cluster/tests/test_k_means.py::test_relocating_with_duplicates[elkan-sparse_array] >=1.4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are somehow now failing on this PR, but wasn't failing in preview in the main branch. Something unusual is going on, because that shouldn't logically be the case.

sklearnex/cluster/k_means.py Show resolved Hide resolved
Comment on lines +105 to +106
n_iter_, inertia_ = None, None
labels_, cluster_centers_ = None, None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless absolutely necessary, these should not be defined as class attributes.

verbose=0,
random_state=None,
copy_x=True,
algorithm="auto",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based off of a PR I just merged, this should fail for sklearn1.1 for init signature matching. Something unusual is going on in testing.

self._save_attributes()

@wrap_output_data
def predict(self, X):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also fail for a test looking for signature matching of methods. Again, something is not occurring properly in CI.

sklearnex/cluster/k_means.py Show resolved Hide resolved
sklearnex/cluster/k_means.py Show resolved Hide resolved
sklearnex/cluster/k_means.py Show resolved Hide resolved
sklearnex/cluster/k_means.py Show resolved Hide resolved
sklearnex/cluster/k_means.py Show resolved Hide resolved
@icfaust
Copy link
Contributor

icfaust commented Feb 21, 2024

I am glad that the BaseKMeans object has been removed, its really unnecessary. There is a lot of work to be done, it may be good to reference the current version of KMeans in preview to match n_jobs changes etc. There is some work that needs to be done before this is pulled out of preview. @md-shafiul-alam please message me if you have any questions.

@icfaust
Copy link
Contributor

icfaust commented Feb 21, 2024

The PR is relying on daal4py still, last commit as evidence, likely due to the daal_check_version. This needs to be sorted out first. This can change test coverage and performance, and will likely need be retested/re-benchmarked.

@md-shafiul-alam md-shafiul-alam deleted the msa-kmeans-oop branch March 15, 2024 04:12
@md-shafiul-alam md-shafiul-alam restored the msa-kmeans-oop branch March 15, 2024 04:15
@md-shafiul-alam md-shafiul-alam mentioned this pull request Mar 22, 2024
8 tasks
@md-shafiul-alam
Copy link
Contributor Author

Moved the changes to #1770

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants