You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A problem we've been facing involves the pruning of old AppUsageEvent and ServiceUsageEvent records. Often, these records are removed before the corresponding Apps or Services have actually stopped, making it difficult to determine how long those resources have been running. If a consumer starts polling after the start record is pruned, it may never know the true start time of that App or Service.
Challenges and Use of Purge/Seed
A specific pain point relates to the destructively_purge_all_and_reseed endpoints for App and Service Usage Events. These endpoints are sometimes used when event tables become inconsistent — often because start records were removed prematurely. While destructively_purge_all_and_reseed recreates running resources in the database, it assigns new start timestamps that do not reflect actual creation or launch times. As a result, usage metrics can become misleading.
Core Problems
Pruning Before Completion
The system prunes old records to manage database growth. However, if an App/Service remains running for a long period, its start record may be deleted before the stop record exists
A newly added or recovering consumer will not see accurate start times
Extended Downtime Leading to Missed Events
Sometimes, a usage-event polling service may go offline for an extended period (e.g. an unnoticed crash). By the time it resumes polling, older events may have been pruned, leaving gaps in historical data
Accurate State Visibility
It becomes challenging to piece together which Apps or Services are still running when critical events have already been removed, forcing reliance on destructively_purge_all_and_reseed to reset the data (where we lose accurate historical start times)
Potential Approaches
After running into this issue repeatedly, I’ve created a set of code changes designed to:
Keep start Records for Active Apps/Services
Records remain in place until the corresponding stop event is encountered, preventing the loss of essential lifecycle information.
Consumer Registration
By including consumer_guid and after_guid in usage-event requests, consumers can register themselves, allowing the Cloud Controller to avoid pruning events they have not yet processed
Threshold-Based Pruning
A configurable limit (threshold_for_keeping_unprocessed_records) ensures the database does not grow indefinitely if a registered consumer stays offline. If the record count exceeds this threshold, older entries can still be pruned
Endpoints for Managing Consumers
Operators or automated systems can view, remove, or otherwise manage registered consumers. This enables consumers to deregister themselves and make more informed decisions about when to request destructively_purge_all_and_reseed
Questions for the Community
Have folks run into a similar challenge with start events being pruned prematurely, leading to confusion about how long resources have been running?
Have you had to use destructively_purge_all_and_reseed in a similar manner?
Does retaining usage events of running Apps and Services sound like a beneficial idea?
Do consumer registration and threshold-based pruning strike a reasonable balance between data retention and database size management?
Are there alternative approaches that could better manage event pruning while preserving critical usage data?
The text was updated successfully, but these errors were encountered:
Problem
A problem we've been facing involves the pruning of old AppUsageEvent and ServiceUsageEvent records. Often, these records are removed before the corresponding Apps or Services have actually stopped, making it difficult to determine how long those resources have been running. If a consumer starts polling after the
start
record is pruned, it may never know the true start time of that App or Service.Challenges and Use of Purge/Seed
A specific pain point relates to the
destructively_purge_all_and_reseed
endpoints for App and Service Usage Events. These endpoints are sometimes used when event tables become inconsistent — often becausestart
records were removed prematurely. Whiledestructively_purge_all_and_reseed
recreates running resources in the database, it assigns newstart
timestamps that do not reflect actual creation or launch times. As a result, usage metrics can become misleading.Core Problems
start
record may be deleted before thestop
record existsPotential Approaches
After running into this issue repeatedly, I’ve created a set of code changes designed to:
start
Records for Active Apps/Servicesstop
event is encountered, preventing the loss of essential lifecycle information.consumer_guid
andafter_guid
in usage-event requests, consumers can register themselves, allowing the Cloud Controller to avoid pruning events they have not yet processedthreshold_for_keeping_unprocessed_records
) ensures the database does not grow indefinitely if a registered consumer stays offline. If the record count exceeds this threshold, older entries can still be pruneddestructively_purge_all_and_reseed
Questions for the Community
start
events being pruned prematurely, leading to confusion about how long resources have been running?destructively_purge_all_and_reseed
in a similar manner?The text was updated successfully, but these errors were encountered: