You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
started playing around with this. a demo of how this might support EDA and better PRs here, see also #39
limitations:
silent, afaict undocumented limit of ~128 dims for vector inputs. Lindenstrauss says we can almost definitely do a random projection and still preserve meaningful structure -- especially since we're going to UMAP down to two dims anyway. assuming there's no privileged basis (possibly too strong), i projected onto the first 128 canonical basis vectors, aka vec[:128]
silent (but documented) limit of 10k rows in a table
incorporating this into the workflow:
could upload a new wandb table every time we refresh the vector index, but that feels excessive. if we upload regularly, we want it to be a small diff
desire for small/meaningful diffs interacts poorly with random subsampling. could use lexical ordering of hashes to get a pseudo-random sample? but it's unclear that wandb artifact dedup works on a sub-file level
rather than doing it every time, could add a separate command for artifact storage and run intermittently
https://twitter.com/_ScottCondron/status/1620347174692454400
The text was updated successfully, but these errors were encountered: