Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The
expMSSA
k-means implementation printed to stdout or to a file. For some peculiar reason, the output in Jupyter cells was being blocked by some buffer condition.Resolution
Nothing about the algorithm or its computation have been changed. Rather, the k-means code was refactored into a
kmeans()
,kmeansChannel()
andkmeansPrint()
. The first two compute the k-means analysis for all channels simultaneously or for a single channel, specified by key by the user. ThekmeansPrint()
routine has the same output style as the originalkmeans()
The
kmeansPrint()
version is no longer in the Python interface, instead replaced by the combined channel k-means, which is the most useful bit of analysis.I added some additional description of the k-means implementation in the Python-interface doc strings. In short,
one uses the standard k-means algorithm where the points are the trajectory matrices reconstructed from each eigen triple and the distance between two points is the Frobenius norm of the difference between reconstructed trajectory matrices.
Tests
I verified that this routine works (i.e. it no long hangs on buffering since there is no I/O). The list of cluster ids and distances from each PC to cluster centroid are returned as a tuple of arrays to Python. The
Part2-Analysis
tutorial has been updated to use this method.Comments
This should be a fairly safe PR. The main question is whether we want it at this point? There is a corresponding change to two of the
pyEXP-examples/Tutorials
notebooks in a local branch, not checked in.