Introduce configuration for purging least recently used models in anomaly detection plugin #3293

mishavay-aws · 2023-08-31T19:57:49Z

Is your feature request related to a problem? Please describe.
Currently (per release v2.4), anomaly detection processor offers cardinality keys support via identification_keys property. This property will create up to 5000 models per each key/value pair. After the limit is reached, the plugin will stop creating new models that can be observed by CardinalityOverflow metric.

Describe the solution you'd like
As an alternative to the existing solution, it would be beneficial to introduce a configuration where least recently used models are purged out and new ones are being automatically created. This way, the implementation becomes more dynamic for handling newly arriving cardinality keys.

Describe alternatives you've considered (Optional)
Other mechanisms for purging unused models can be considered.

Additional context
Add any other context or screenshots about the feature request here.

sudiptoguha · 2023-09-23T21:46:50Z

This is a very helpful request. Now that streaming normalization is implicit in the anomaly detector (a difference from OpenSearch AD up to 2.9 -- it may be resolved down the road) it may be possible to compress the thousands of models into a few models as well. There are tradeoffs of course, using the same model for multiple entities definitely risks desensitization of the model (e.g., entity A never had X occurring; whereas entity B had X occurring quite frequently -- the net result of using a joint model would be that no algorithm (information theoretically) would be able to detect "event X occurred for A and was unusual". But that being said there can be significant benefits. RCF algorithm has been applied to perform anomaly detection over 1000 entities (see aws/random-cut-forest-by-aws#397 and a partial validation from the community aws/random-cut-forest-by-aws#398). One issue would be serialization/de-serialization -- the models would clearly need to be partitioned into 2 pieces (one for entity specific contexts which performs simpler tasks like normalization, and one for the common RCF regression model)

mishavay-aws added the untriaged label Aug 31, 2023

github-project-automation bot added this to Data Prepper Tracking Board Aug 31, 2023

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Aug 31, 2023

dlvenable added enhancement New feature or request and removed untriaged labels Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce configuration for purging least recently used models in anomaly detection plugin #3293

Introduce configuration for purging least recently used models in anomaly detection plugin #3293

mishavay-aws commented Aug 31, 2023 •

edited

Loading

sudiptoguha commented Sep 23, 2023

Introduce configuration for purging least recently used models in anomaly detection plugin #3293

Introduce configuration for purging least recently used models in anomaly detection plugin #3293

Comments

mishavay-aws commented Aug 31, 2023 • edited Loading

sudiptoguha commented Sep 23, 2023

mishavay-aws commented Aug 31, 2023 •

edited

Loading