-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] functional n_jobs parameter for knn classifier #2478
Comments
Thanks for the benchmarking job ! Could you try switching the joblib backend to |
@baraline @Ramana-Raja How about parallelizing the loop in |
It can work aswell yes, but then you need to balance number of threads (for kneighbor) per process (for sample or group of sample to predict) to find the right balance |
Maybe we can limit the number of threads per process with max cpu count divided by njobs? |
This approach also doesn’t seem to work as intended if the data is small. It actually ends up making the execution time worse, as you can see below |
@Ramana-Raja That behavior is expected. It occurs because the time required for process creation and context switching exceeds the compute time. To address this, analyze how the problem scales and plot a graph comparing execution times with and without parallelization. The intersection point will indicate the optimal input size where parallelism becomes beneficial. Ideally, this function should allow for dynamically switching between single-threaded and multithreaded execution. |
This seems to depend on the CPU, right? The optimal data size might vary for different users with different CPU's. I think the best approach is to leave it configurable as a hyperparameter |
Makes sense, @baraline wdyt? |
I think the simplest would be to offload such hasssle to the numba compiler. We could use the existing functions of the distance module (e.g. euclidean_pairwise_distance) to compute the distance matrix in parallel, by adding a This would necessitate a change in such function for all distances though, @chrisholder what do you think ? |
Describe the feature or idea you want to propose
Current n_jobs params is not doing anything in knn classifier.
Describe your proposed solution
Make use of it ! TBD how exactly. If not possible a warning should at least be raised.
Describe alternatives you've considered, if relevant
No response
Additional context
Was curious after looking at sequentia benchmark on why we were so slow...
The text was updated successfully, but these errors were encountered: