Problems related to pruning old AppUsageEvent and ServiceUsageEvent records #4182

joyvuu-dave · 2025-01-21T21:05:49Z

Problem

A problem we've been facing involves the pruning of old AppUsageEvent and ServiceUsageEvent records. Often, these records are removed before the corresponding Apps or Services have actually stopped, making it difficult to determine how long those resources have been running. If a consumer starts polling after the start record is pruned, it may never know the true start time of that App or Service.

Challenges and Use of Purge/Seed

A specific pain point relates to the destructively_purge_all_and_reseed endpoints for App and Service Usage Events. These endpoints are sometimes used when event tables become inconsistent — often because start records were removed prematurely. While destructively_purge_all_and_reseed recreates running resources in the database, it assigns new start timestamps that do not reflect actual creation or launch times. As a result, usage metrics can become misleading.

Core Problems

Pruning Before Completion
- The system prunes old records to manage database growth. However, if an App/Service remains running for a long period, its start record may be deleted before the stop record exists
- A newly added or recovering consumer will not see accurate start times
Extended Downtime Leading to Missed Events
- Sometimes, a usage-event polling service may go offline for an extended period (e.g. an unnoticed crash). By the time it resumes polling, older events may have been pruned, leaving gaps in historical data
Accurate State Visibility
- It becomes challenging to piece together which Apps or Services are still running when critical events have already been removed, forcing reliance on destructively_purge_all_and_reseed to reset the data (where we lose accurate historical start times)

Potential Approaches

After running into this issue repeatedly, I’ve created a set of code changes designed to:

Keep start Records for Active Apps/Services
- Records remain in place until the corresponding stop event is encountered, preventing the loss of essential lifecycle information.
Consumer Registration
- By including consumer_guid and after_guid in usage-event requests, consumers can register themselves, allowing the Cloud Controller to avoid pruning events they have not yet processed
Threshold-Based Pruning
- A configurable limit (threshold_for_keeping_unprocessed_records) ensures the database does not grow indefinitely if a registered consumer stays offline. If the record count exceeds this threshold, older entries can still be pruned
Endpoints for Managing Consumers
- Operators or automated systems can view, remove, or otherwise manage registered consumers. This enables consumers to deregister themselves and make more informed decisions about when to request destructively_purge_all_and_reseed

Questions for the Community

Have folks run into a similar challenge with start events being pruned prematurely, leading to confusion about how long resources have been running?
Have you had to use destructively_purge_all_and_reseed in a similar manner?
Does retaining usage events of running Apps and Services sound like a beneficial idea?
Do consumer registration and threshold-based pruning strike a reasonable balance between data retention and database size management?
Are there alternative approaches that could better manage event pruning while preserving critical usage data?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems related to pruning old AppUsageEvent and ServiceUsageEvent records #4182

Problems related to pruning old AppUsageEvent and ServiceUsageEvent records #4182

joyvuu-dave commented Jan 21, 2025

Problems related to pruning old AppUsageEvent and ServiceUsageEvent records #4182

Problems related to pruning old AppUsageEvent and ServiceUsageEvent records #4182

Comments

joyvuu-dave commented Jan 21, 2025

Problem

Challenges and Use of Purge/Seed

Core Problems

Potential Approaches

Questions for the Community