Replies: 4 comments 9 replies
-
Hi @law, here is a list of tools and techniques I use when debugging complex pipelines:
For your case specifically, I would investigate if events are dropped or if events get stuck in a retry loop. For example, is this component_discarded_events_total metric greater than 0? |
Beta Was this translation helpful? Give feedback.
-
One potential thing to check here, since you mentioned it was off by a factor of 10, is if the metric type and interval is being set correctly in Datadog. We've seen issues in the past where the interval or type was wrong, causing things to appear to be off by a factor of 10. Also, what version of Vector are you running? |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for getting back to me, and my apologies for the long reply-time - I didn't get an email for some reason, and just checked the thread today on a whim. I've run
this is on a pod where vector (and nginx) have been running for about 4 hours. What stands out to me is the 'prometheus' sink. It is receiving 2.54M events (146/s), outputting only 134.48k events. There appears to be significant event reduction here, but I'm not sure why. There's no errors in the vector logs. Do I need to increase logging to get a better understanding of what's going on here? |
Beta Was this translation helpful? Give feedback.
-
More grist for the mill: I captured all the using 'grep', I got all the lines matching the string 'monolith-nginx' from that text-file, and put them in their own text-file. wc -l nginx-output.txt shows that file has 36,997 log-lines in it. Further analysis shows that the only pod-name in that 5-minute snippet is 'monolith-nginx-57f8c68bc8-d4jr6' I go over to datadog log-explorer, set my time-range to 10:32pm-10:37pm, tell it to only find logs for the pod-name 'monolith-nginx-57f8c68bc8-d4jr6'. Datadog log-count? 14,181. I don't perzactly know what that means, but I'm even MORE stumped now. |
Beta Was this translation helpful? Give feedback.
-
I'm experiencing a significant discrepancy between metrics generated via Vector's log_to_metric transform and Datadog when processing the same nginx logs. The Vector pipeline consistently shows approximately 1/10th of the traffic volume compared to Datadog's measurements.
Setup details:
Current observations:
Configurations:
Mimir config: https://pastebin.com/ixUGPwNP
Vector config: https://pastebin.com/2RfCGUK8
Questions:
Any guidance on troubleshooting techniques or configuration issues would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions