-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local dataprepper DLQ file is not getting filled with dropped traces #4736
Comments
@dontknowany , Do you have any logs from Data Prepper itself that may help us understand what the cause might be? Also, do you see if the file is being created by Data Prepper at all? Is it empty? Or no file? |
@dlvenable |
@dontknowany DataPrepper doesn't put invalid traces to DLQ. It is a feature we are working on but for now the events are sent to DLQ only when they are failed to written to sink |
@kkondaka Another question: Why is DataPrepper putting traces into his logs then? Is it just for "documentation" sake and for seeing that something is happening? Are the traces that are inside the logs of DataPrepper wrong or invalid in any shape or form or is it expected behaviour that they are logged that way? :) |
Maybe it would help, to get some metrics from DataPrepper. It usually indicates the buffer utilization quite nicely. We might get an idea, where your data is stuck. It would also be good to learn the status of the circuit breakers. |
@KarstenSchnitter Regarding metrics: We have a complete Dashboard in Grafana for different DataPrepper metrics. Buffer usage, heap usage, cpu usage, records written/read from buffer, latency, records written to OpenSearch and so on. You name it, we got it , so to speak. The interesting thing is that there is nothing shown in the metrics which would indicate a issue. There are no records which failed to be written into the buffer nor errors from opensearch regarding failed bulk ingestion server requests. Regarding the circuit breaker: We have it in place but only since a month or so. We did not utilize DataPrepper to the limit which is why we did not have it configured in the past. Although our load has not changed I still configured it now just in case. |
@dontknowany How do the write rates of the collector-pipeline compare to the raw-pipeline and the messages indexed in OpenSearch? How does it compare to the actual rate of sent messages? Can you provide an example of a failing event? If the mapping of the event fails, it will not be written to the buffer at all, if I remember correctly. |
@KarstenSchnitter Looking at our metrics, there are on average 400 messages coming in into our stack through our 3 collector pipelines/instances. Then they are all getting pushed to Kafka which we use for buffering and after that there is one otel ingestor instance which does some light filtering and downsampling to 25%. So in the end there are getting roughly 100ish messages ingested into OpenSearch every second using DataPrepper. I don't see a issue with that. Looks fine to me. Regarding a example of a failed event: I can give you a example of a json trace which gets logged by DataPrepper. The only thing that I noticed about the trace below is that the trace_id and the span_id look a bit off. Could that be the reason why the trace is getting logged? I also can't find the trace in OpenSearch (which makes sense if the trace is invalid)
I don't know why this trace is getting logged by DataPrepper. Its disrupting our log messages since it floods the logs and the important logs are getting buried in between those big trace logs. And it also generates a lot of logs in OpenSearch since every line of the trace above is one log message in OpenSearch :/ |
Indeed the trace and span ids look weird. They are supposed to be a 16 byte and an 8 byte array respectively by the OTLP protobuf definition. DataPrepper maps these fields in class OTelProtoCodec with function convertByteStringToString: import org.apache.commons.codec.binary.Hex;
// ...
public static String convertByteStringToString(ByteString bs) {
return Hex.encodeHexString(bs.toByteArray());
} This means, the document to be indexed in OpenSearch should contain a 32 character hex string for the trace id and a 16 character hex string for the span id. This is clearly not the case in your example, where the values appear to be unmapped. I am also curious about the empty |
The event that I sent was indeed from DataPrepper and not from one of the OpenTelemetry Collectors. Our complete trace pipeline looks like this just to clarify our setup. Application --> Otel Collector --> Kafka --> Otel Collector --> DataPrepper --> OpenSearch |
Do you have by any chance an event from the OTel Collector, that was sent to DataPrepper resulting in an output like you pasted earlier? |
Sorry for not being as active regarding this issue. I was on vacation and there were also other things that needed some attention. Regarding your question: Our OTel Collectors are really quiet regarding logs/events. They only spit out some stuff when there is a real issue for example DataPrepper being unavailable. We are only getting a lot of events from DataPrepper itself and not from our OTel Collectors. Though we are getting events from our OTel Collector which we use for getting traces out of kafka and ingesting them into DataPrepper. This is one of the events:
I would assume that if the OTel Collector drops the invalid data then it would not cause any event in DataPrepper since the dropped items would never reach it. |
I was wondering, if you could use the debug exporter with detailed verbosity to log the trace, that is rejected by DataPrepper to the console. This is also described in the Troubleshooting section of the OpenTelemetry Collector configuration. |
Good idea. I will enable that and see if I get any useful information out of those debug logs and then let you know with some examples |
Hi all,
I'm not sure if I'm missing something or if I'm doing something wrong on my end.
In my current data prepper setup there are traces that are getting dropped/sent to the logs of data prepper itself which pollutes its own logs and therefore important messages are getting swallowed in between all the dropped/logged traces. I would like to investigate those dropped traces further so that's why I added the local DLQ path to my pipeline config under the sink section so that those traces are getting written/exported into this file for further troubleshooting instead of the logs.
The problem that I'm facing now is that the DLQ file stays empty and is not being filled with the dropped traces. They are still getting dropped into the logs of data prepper. Data prepper is also not compaining about a misconfigured DLQ config in the logs. The DLQ file just stays empty for some reason.
The option to use AWS S3 for the DLQs is not working for me since we don't use it in our tech stack. We would either need support to use GCS or other 3rd party S3 options.
I found the config for the DLQ path here: https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md
This is my current pipeline.yaml file:
Maybe I misunderstand what the DLQ is/should be used for and I thought it would be suitable for my use case. I also don't know if it is expected behaviour for data prepper to write the complete traces into his own logs.
I'm using the newest release of data prepper (2.8.0)
I hope someone can shed some light on my problem and help me fix it.
If you need any additional information that I did not provide or forgot to provide, just ask and I will try my best to give you that information.
Cheers!
The text was updated successfully, but these errors were encountered: