-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online, but what about streaming? #13
Comments
I'll have a look into it. I think the actual algorithm is capable of that, but the implementation is not engineered for it. I guess it will need only some refactoring and no actual fiddling around with the math. |
Ok. I implemented a class with the ability to |
FWIW I tried an approach where I started the run-lengths matrix with a list |
I see. My first guess for a more efficient approximation would be, that we could ignore many A sketch:
Thus we ignore all probabilities smaller |
This may work well; it's not clear to me before running it on some datasets. I've implemented a solution as an anomaly detection algorithm in NAB, and will link it here once I clean it up a bit and push it. The main idea is because we're only interested in the points at which the data stream changes (i.e. a change point), we can keep only the data that would show us a change point for the current timestep. It actually works slightly better than using the entire preallocated R matrix 😉 |
The online algorithm is implemented here as an anomaly detector in NAB. |
The algorithm runs online, but with the assumption we have the length of the dataset a priori. What about streaming scenarios where we have a continuous stream of data? Is there (an efficient) way to the run the online algorithm without knowing the length of the dataset?
The text was updated successfully, but these errors were encountered: