Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k-means update for pyEXP #111

Merged
merged 3 commits into from
Feb 26, 2025
Merged

k-means update for pyEXP #111

merged 3 commits into from
Feb 26, 2025

Conversation

The9Cat
Copy link
Contributor

@The9Cat The9Cat commented Feb 25, 2025

Problem

The expMSSA k-means implementation printed to stdout or to a file. For some peculiar reason, the output in Jupyter cells was being blocked by some buffer condition.

Resolution

Nothing about the algorithm or its computation have been changed. Rather, the k-means code was refactored into a kmeans(), kmeansChannel() and kmeansPrint(). The first two compute the k-means analysis for all channels simultaneously or for a single channel, specified by key by the user. The kmeansPrint() routine has the same output style as the original kmeans()

The kmeansPrint() version is no longer in the Python interface, instead replaced by the combined channel k-means, which is the most useful bit of analysis.

I added some additional description of the k-means implementation in the Python-interface doc strings. In short,
one uses the standard k-means algorithm where the points are the trajectory matrices reconstructed from each eigen triple and the distance between two points is the Frobenius norm of the difference between reconstructed trajectory matrices.

Tests

I verified that this routine works (i.e. it no long hangs on buffering since there is no I/O). The list of cluster ids and distances from each PC to cluster centroid are returned as a tuple of arrays to Python. The Part2-Analysis tutorial has been updated to use this method.

Comments

This should be a fairly safe PR. The main question is whether we want it at this point? There is a corresponding change to two of the pyEXP-examples/Tutorials notebooks in a local branch, not checked in.

Martin D. Weinberg added 3 commits February 25, 2025 15:31
…e primary routine is now the full channel analysis, not the per-channel analysis in pyEXP.
… by default, a good choice for quasi-periodic systems
Copy link
Member

@michael-petersen michael-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good changes to refactoring the code, thanks! Very readable and clear what is going on.

@michael-petersen
Copy link
Member

My sense is that it doesn't make sense to cut back on a feature that might be useful -- I agree we haven't yet fully discovered the use case, but maybe that means it is a project ripe for someone looking to get in on software and technique development to get in on!

I forgot to say in my previous comment that I tried this and can confirm everything works for me.

@The9Cat
Copy link
Contributor Author

The9Cat commented Feb 26, 2025

Okay, thanks! Let's merge it...and I'll merge in the notebook changes to EXP-code/pyEXP-examples/#13.

@The9Cat The9Cat merged commit fe0f93e into main Feb 26, 2025
8 checks passed
@The9Cat The9Cat deleted the fixKmeans branch February 26, 2025 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants