You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently (per release v2.4), anomaly detection processor offers cardinality keys support via identification_keys property. This property will create up to 5000 models per each key/value pair. After the limit is reached, the plugin will stop creating new models that can be observed by CardinalityOverflow metric.
Describe the solution you'd like
As an alternative to the existing solution, it would be beneficial to introduce a configuration where least recently used models are purged out and new ones are being automatically created. This way, the implementation becomes more dynamic for handling newly arriving cardinality keys.
Describe alternatives you've considered (Optional)
Other mechanisms for purging unused models can be considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
This is a very helpful request. Now that streaming normalization is implicit in the anomaly detector (a difference from OpenSearch AD up to 2.9 -- it may be resolved down the road) it may be possible to compress the thousands of models into a few models as well. There are tradeoffs of course, using the same model for multiple entities definitely risks desensitization of the model (e.g., entity A never had X occurring; whereas entity B had X occurring quite frequently -- the net result of using a joint model would be that no algorithm (information theoretically) would be able to detect "event X occurred for A and was unusual". But that being said there can be significant benefits. RCF algorithm has been applied to perform anomaly detection over 1000 entities (see aws/random-cut-forest-by-aws#397 and a partial validation from the community aws/random-cut-forest-by-aws#398). One issue would be serialization/de-serialization -- the models would clearly need to be partitioned into 2 pieces (one for entity specific contexts which performs simpler tasks like normalization, and one for the common RCF regression model)
Is your feature request related to a problem? Please describe.
Currently (per release v2.4), anomaly detection processor offers cardinality keys support via identification_keys property. This property will create up to 5000 models per each key/value pair. After the limit is reached, the plugin will stop creating new models that can be observed by CardinalityOverflow metric.
Describe the solution you'd like
As an alternative to the existing solution, it would be beneficial to introduce a configuration where least recently used models are purged out and new ones are being automatically created. This way, the implementation becomes more dynamic for handling newly arriving cardinality keys.
Describe alternatives you've considered (Optional)
Other mechanisms for purging unused models can be considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: