Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce() from kafka.Producer will always return no errors if cfg.Sync=true #513

Open
endorama opened this issue Jul 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@endorama
Copy link
Member

endorama commented Jul 2, 2024

Produce() method of kafka.Producer has this signature:

func (p *Producer) Produce(ctx context.Context, rs ...apmqueue.Record) error {

It also has 2 behaviors: async and sync production. This behavior is triggered by the cfg.Sync config option.

In the async production case the errors occurring are passed over to the cfg.ProduceCallback if specified. This is correct.

In the sync production case errors occurring are not collected in any way and will never be returned to the caller. Worse: the caller will never know an error happened.

This is the relevant code where we produce to kafka:

apm-queue/kafka/producer.go

Lines 243 to 258 in a508dbd

p.client.Produce(ctx, kgoRecord, func(r *kgo.Record, err error) {
defer wg.Done()
// kotel already marks spans as errors. No need to handle it here.
if err != nil {
p.cfg.Logger.Error("failed producing message",
zap.Error(err),
zap.String("topic", strings.TrimPrefix(r.Topic, namespacePrefix)),
zap.Int64("offset", r.Offset),
zap.Int32("partition", r.Partition),
zap.Any("headers", headers),
)
}
if p.cfg.ProduceCallback != nil {
p.cfg.ProduceCallback(r, err)
}
})

I'm not sure this behavior is intended and I would consider it a bug. The solution could be to collect errors with errors.Join and return the resulting error on line 261

apm-queue/kafka/producer.go

Lines 260 to 264 in a508dbd

if p.cfg.Sync {
wg.Wait()
}
return nil
}

This way on cfg.Sync=true errors would be collected and returned. On cfg.Sync=false the code would not wait thus returning nil keeping the current behavior.

@endorama endorama added the bug Something isn't working label Jul 2, 2024
@simitt
Copy link
Contributor

simitt commented Sep 9, 2024

@endorama is my understanding correct that this poses a problem for non-production environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants