Batching at kafka receiver #351

srikanthccv · 2024-07-19T05:30:21Z

Increasing the receive size doesn't help in reducing the inserts because the kafkareceiver sends each message to the next consumer. The ideal implementation should combine the individual message spans/logs/metrics into one big ResourceSpans/Logs/Metrics by appending them and sending them across the pipeline.

signoz-otel-collector/receiver/signozkafkareceiver/kafka_receiver.go

Lines 470 to 502 in c6f81e7

    
           for { 
        
           	select { 
        
           	case message, ok := <-claim.Messages(): 
        
           		if !ok { 
        
           			return nil 
        
           		} 
        
           		start := time.Now() 
        
           		c.logger.Debug("Kafka message claimed", 
        
           			zap.String("value", string(message.Value)), 
        
           			zap.Time("timestamp", message.Timestamp), 
        
           			zap.String("topic", message.Topic)) 
        
           		if !c.messageMarking.After { 
        
           			session.MarkMessage(message, "") 
        
           		} 
        
           		ctx := c.obsrecv.StartTracesOp(session.Context()) 
        
           		statsTags := []tag.Mutator{tag.Upsert(tagInstanceName, c.id.String())} 
        
           		_ = stats.RecordWithTags(ctx, statsTags, 
        
           			statMessageCount.M(1), 
        
           			statMessageOffset.M(message.Offset), 
        
           			statMessageOffsetLag.M(claim.HighWaterMarkOffset()-message.Offset-1)) 
        
           		traces, err := c.unmarshaler.Unmarshal(message.Value) 
        
           		if err != nil { 
        
           			c.logger.Error("failed to unmarshal message", zap.Error(err)) 
        
           			if c.messageMarking.After && c.messageMarking.OnError { 
        
           				session.MarkMessage(message, "") 
        
           			} 
        
           			return err 
        
           		} 
        
           		spanCount := traces.SpanCount() 
        
           		err = c.nextConsumer.ConsumeTraces(session.Context(), traces)

The text was updated successfully, but these errors were encountered:

srikanthccv · 2024-07-19T05:30:58Z

There is work to move batching to exporter but It will take a while to get it to stable.

grandwizard28 · 2024-07-21T16:54:42Z

Just curious, why not use the batch processor?

srikanthccv · 2024-07-22T02:22:18Z

The batch processor queues the item and returns immediately. Kafka receiver then marks the message as consumed, despite it not yet being written to storage. This creates a risk of data loss if the collector crashes or the storage backend becomes unavailable for an extended period. We should only mark messages as consumed after receiving confirmation from ClickHouse that the data has been successfully written.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching at kafka receiver #351

Batching at kafka receiver #351

srikanthccv commented Jul 19, 2024

srikanthccv commented Jul 19, 2024

grandwizard28 commented Jul 21, 2024

srikanthccv commented Jul 22, 2024

Batching at kafka receiver #351

Batching at kafka receiver #351

Comments

srikanthccv commented Jul 19, 2024

srikanthccv commented Jul 19, 2024

grandwizard28 commented Jul 21, 2024

srikanthccv commented Jul 22, 2024