You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 15, 2022. It is now read-only.
Typically service map records are emitted in pairs: destination (client) and target (server). An ES cluster got into a bad state where only half of the pair was received, which caused the front-end JS code to error. We're asking the front-end to add null checks, however we need to check if there's anything we can do our end to improve resilience. First thoughts might be to:
Confirm that failed writes are being retried by the ES sink
Potentially have the ServiceMap prepper re-send records that might have already been sent.
For the ServiceMap changes, the current logic is:
After a set interval, find relationships between nodes in memory
Before sending the relationship record to the ES sink, first check if that record has already been sent previously
This is to prevent "duplicate" records from being sent every few minutes
It might make sense to just remove the duplicate record check and just continuously send service map records. Yes we'll be increasing ES writes, but if can assume that only a few hundred records will be sent every 3 minutes then that seems to be a decent fallback to "fill-in" missing service map gaps.
The text was updated successfully, but these errors were encountered:
Related to #479 and opendistro-for-elasticsearch/trace-analytics#32.
Typically service map records are emitted in pairs: destination (client) and target (server). An ES cluster got into a bad state where only half of the pair was received, which caused the front-end JS code to error. We're asking the front-end to add null checks, however we need to check if there's anything we can do our end to improve resilience. First thoughts might be to:
For the ServiceMap changes, the current logic is:
It might make sense to just remove the duplicate record check and just continuously send service map records. Yes we'll be increasing ES writes, but if can assume that only a few hundred records will be sent every 3 minutes then that seems to be a decent fallback to "fill-in" missing service map gaps.
The text was updated successfully, but these errors were encountered: