k-means update for pyEXP #111

The9Cat · 2025-02-25T20:44:59Z

Problem

The expMSSA k-means implementation printed to stdout or to a file. For some peculiar reason, the output in Jupyter cells was being blocked by some buffer condition.

Resolution

Nothing about the algorithm or its computation have been changed. Rather, the k-means code was refactored into a kmeans(), kmeansChannel() and kmeansPrint(). The first two compute the k-means analysis for all channels simultaneously or for a single channel, specified by key by the user. The kmeansPrint() routine has the same output style as the original kmeans()

The kmeansPrint() version is no longer in the Python interface, instead replaced by the combined channel k-means, which is the most useful bit of analysis.

I added some additional description of the k-means implementation in the Python-interface doc strings. In short,
one uses the standard k-means algorithm where the points are the trajectory matrices reconstructed from each eigen triple and the distance between two points is the Frobenius norm of the difference between reconstructed trajectory matrices.

Tests

I verified that this routine works (i.e. it no long hangs on buffering since there is no I/O). The list of cluster ids and distances from each PC to cluster centroid are returned as a tuple of arrays to Python. The Part2-Analysis tutorial has been updated to use this method.

Comments

This should be a fairly safe PR. The main question is whether we want it at this point? There is a corresponding change to two of the pyEXP-examples/Tutorials notebooks in a local branch, not checked in.

…e primary routine is now the full channel analysis, not the per-channel analysis in pyEXP.

… by default, a good choice for quasi-periodic systems

michael-petersen

All good changes to refactoring the code, thanks! Very readable and clear what is going on.

michael-petersen · 2025-02-26T15:57:23Z

My sense is that it doesn't make sense to cut back on a feature that might be useful -- I agree we haven't yet fully discovered the use case, but maybe that means it is a project ripe for someone looking to get in on software and technique development to get in on!

I forgot to say in my previous comment that I tried this and can confirm everything works for me.

The9Cat · 2025-02-26T17:18:57Z

Okay, thanks! Let's merge it...and I'll merge in the notebook changes to EXP-code/pyEXP-examples/#13.

Martin D. Weinberg added 3 commits February 25, 2025 15:31

An updated k-means analysis that separates printing from analysis. Th…

223b68b

…e primary routine is now the full channel analysis, not the per-channel analysis in pyEXP.

Make the random shuffle seed for KMeans the default in expMSSA

3074340

Allow the user to specify the seeding strategy for k-means. 2-strided…

4f40e7c

… by default, a good choice for quasi-periodic systems

michael-petersen approved these changes Feb 26, 2025

View reviewed changes

The9Cat merged commit fe0f93e into main Feb 26, 2025
8 checks passed

The9Cat deleted the fixKmeans branch February 26, 2025 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k-means update for pyEXP #111

k-means update for pyEXP #111

The9Cat commented Feb 25, 2025 •

edited

Loading

michael-petersen left a comment

michael-petersen commented Feb 26, 2025

The9Cat commented Feb 26, 2025

k-means update for pyEXP #111

k-means update for pyEXP #111

Conversation

The9Cat commented Feb 25, 2025 • edited Loading

Problem

Resolution

Tests

Comments

michael-petersen left a comment

Choose a reason for hiding this comment

michael-petersen commented Feb 26, 2025

The9Cat commented Feb 26, 2025

The9Cat commented Feb 25, 2025 •

edited

Loading