-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid loading full data in memory? #16
Comments
Hi Loan, |
Hi Jesko, |
I see two ways to address transformations:
Not sure whether I will have time to create a PR soon but let me know what you think. |
You're right, many things could be computed feature-wise or decently approximated. I feel things like the quantification of distances between distributions which requires the computation of a covariance matrix or robust estimator of dispersion would still be a scientific challenge and not just an implementation problem. |
As a long-term improvement, it would be great to be able to construct and run the pipeline without loading the whole dataset in memory, but processing it iteratively (hard to achieve when operations are performed both on rows and columns).
The text was updated successfully, but these errors were encountered: