Facilitate asynchronous realtime ingestion on decoding & transformation #13695

lnbest0707-uber · 2024-07-26T23:24:15Z

ingestion enhancement feature request
Resolving issues mentioned in #13319.
Pinot ingestion is currently using a strictly serial processing. It fetches a batch of messages from Kafka and then for messages in the batch, it would process one by one with the order of offsets to:

Decode
Transform
Index

It provides benefits to reuse the objects created in between to achieve better memory efficiency but not able to utilize all system resources. There are multiple solutions with their pros and cons:

Async processing for each step as this patch introduced.
- It could still preserve the same order of messages and retain same offset control logics.
- It still could not fully utilize the system resources.
- It brings overhead on memory and GC as not able to reuse objects as before. (There could be TODOs to make size of each batch configurable)
Batch (multiple executors) processing on decoding and transformation.
- Hard to ensure the order of messages, it might only do at least once consumption instead of current (almost) exact once.
Full batch process on all 3 steps.
- Current indexing logics and data structures do not really support parallel processing.

Comparing the CPU usage and consumption speed on same server before and after (at 10:00) enabling "ASYNCHRONOUS":

The more CPU usage contributes a ~10% ingestion speed increase.

Notes:
The new mode is better performed on computation heavy ingestion but cannot really help on light weight use cases. In light computation use cases, the extra memory and GC overhead would compensate the async process gain.

codecov-commenter · 2024-07-27T00:07:03Z

Codecov Report

Attention: Patch coverage is 61.58192% with 68 lines in your changes missing coverage. Please review.

Project coverage is 61.99%. Comparing base (59551e4) to head (c3ca31a).
Report is 804 commits behind head on master.

Files	Patch %	Lines
...a/manager/realtime/RealtimeSegmentDataManager.java	59.52%	53 Missing and 15 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #13695      +/-   ##
============================================
+ Coverage     61.75%   61.99%   +0.24%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2555     +119     
  Lines        133233   140750    +7517     
  Branches      20636    21891    +1255     
============================================
+ Hits          82274    87257    +4983     
- Misses        44911    46850    +1939     
- Partials       6048     6643     +595

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`61.96% <61.58%> (+0.25%)`	⬆️
java-21	`61.87% <61.58%> (+0.24%)`	⬆️
skip-bytebuffers-false	`61.98% <61.58%> (+0.23%)`	⬆️
skip-bytebuffers-true	`61.84% <61.58%> (+34.12%)`	⬆️
temurin	`61.99% <61.58%> (+0.24%)`	⬆️
unittests	`61.99% <61.58%> (+0.24%)`	⬆️
unittests1	`46.47% <61.58%> (-0.43%)`	⬇️
unittests2	`27.73% <0.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Jackie-Jiang

How do you guarantee the message order when message batches are processed asynchronously?

Jackie-Jiang · 2024-08-16T01:15:42Z

...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java

+  private AtomicInteger _numRowsConsumed = new AtomicInteger(0);
+  // Can be different from _numRowsConsumed when metrics update is enabled.
+  private AtomicInteger _numRowsIndexed = new AtomicInteger(0);
+  private AtomicInteger _numRowsErrored = new AtomicInteger(0);


These can be final?

Jackie-Jiang · 2024-08-16T01:19:07Z

...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java

+    BlockingQueue<Pair<List<GenericRow>, Integer>> transformedQueue = new LinkedBlockingQueue<>();
+    AtomicInteger submittedMsgCount = new AtomicInteger(0);
+    // TODO: tune the number of threads
+    ExecutorService decodeAndTransformExecutor = Executors.newFixedThreadPool(1);


Starting a new executor per message batch can create big overhead. Consider creating an executor to be shared for different batches

Jackie-Jiang · 2024-08-16T01:25:05Z

...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java

+      }
+    });
+
+    indexingThread.start();


Who is executing this thread?

Consumer thread for the partition is the one kicking off this "indexingThread". I don't understand why we kick off a separate thread, and then in the next line, we wait for it to finish. What's the difference if we don't spin off a new thread, and use the main thread (consuming thread) to do the indexing?

mcvsubbu · 2024-08-16T03:25:11Z

cc: @sajjad-moradi

mcvsubbu · 2024-08-16T03:27:45Z

Is is useful to create a sub-class of RealtimeSegmentDataManager that consumes asynchronously?

Jackie-Jiang · 2024-08-16T05:09:02Z

To ensure the ingestion order, we might be able to use a producer-consumer pattern, where consumer thread creates MessageBatchs and put them into a queue; another ingestion thread pull MessageBatchs from the queue, transform the records and index them into segment. This way we can use 2 threads per partition.

mcvsubbu · 2024-08-16T13:24:34Z

How would ingestion rate limiting work?

lnbest0707-uber · 2024-08-16T18:59:46Z

To ensure the ingestion order, we might be able to use a producer-consumer pattern, where consumer thread creates MessageBatchs and put them into a queue; another ingestion thread pull MessageBatchs from the queue, transform the records and index them into segment. This way we can use 2 threads per partition.

Thanks for the review. Right now, it is only using 1 thread to guarantee the order. For multiple threads, I am thinking about adapting the way mentioned in Multi-topic ingestion support if it goes through the review.

sajjad-moradi · 2024-08-21T21:08:27Z

...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java

+
+    BlockingQueue<Pair<List<GenericRow>, Integer>> transformedQueue = new LinkedBlockingQueue<>();
+    AtomicInteger submittedMsgCount = new AtomicInteger(0);
+    // TODO: tune the number of threads


We can't have more than one thread here otherwise the order of indexed rows will be different, and that's something we can't tolerate.

sajjad-moradi · 2024-08-21T21:14:14Z

...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java

+      }
+    });
+
+    indexingThread.start();


Consumer thread for the partition is the one kicking off this "indexingThread". I don't understand why we kick off a separate thread, and then in the next line, we wait for it to finish. What's the difference if we don't spin off a new thread, and use the main thread (consuming thread) to do the indexing?

lnbest0707-uber added 2 commits July 26, 2024 15:58

Enable asynchronous message transformation

4bfef0b

Add license

c3ca31a

tibrewalpratik17 added enhancement ingestion feature request labels Jul 30, 2024

Jackie-Jiang added real-time documentation Configuration Config changes (addition/deletion/change in behavior) and removed feature request labels Aug 16, 2024

Jackie-Jiang reviewed Aug 16, 2024

View reviewed changes

sajjad-moradi reviewed Aug 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facilitate asynchronous realtime ingestion on decoding & transformation #13695

Facilitate asynchronous realtime ingestion on decoding & transformation #13695

lnbest0707-uber commented Jul 26, 2024 •

edited

Loading

codecov-commenter commented Jul 27, 2024 •

edited

Loading

Jackie-Jiang left a comment

Jackie-Jiang Aug 16, 2024

Jackie-Jiang Aug 16, 2024

Jackie-Jiang Aug 16, 2024

sajjad-moradi Aug 21, 2024

mcvsubbu commented Aug 16, 2024

mcvsubbu commented Aug 16, 2024

Jackie-Jiang commented Aug 16, 2024

mcvsubbu commented Aug 16, 2024

lnbest0707-uber commented Aug 16, 2024

sajjad-moradi Aug 21, 2024

sajjad-moradi Aug 21, 2024

Facilitate asynchronous realtime ingestion on decoding & transformation #13695

Are you sure you want to change the base?

Facilitate asynchronous realtime ingestion on decoding & transformation #13695

Conversation

lnbest0707-uber commented Jul 26, 2024 • edited Loading

codecov-commenter commented Jul 27, 2024 • edited Loading

Codecov Report

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Jackie-Jiang Aug 16, 2024

Choose a reason for hiding this comment

Jackie-Jiang Aug 16, 2024

Choose a reason for hiding this comment

Jackie-Jiang Aug 16, 2024

Choose a reason for hiding this comment

sajjad-moradi Aug 21, 2024

Choose a reason for hiding this comment

mcvsubbu commented Aug 16, 2024

mcvsubbu commented Aug 16, 2024

Jackie-Jiang commented Aug 16, 2024

mcvsubbu commented Aug 16, 2024

lnbest0707-uber commented Aug 16, 2024

sajjad-moradi Aug 21, 2024

Choose a reason for hiding this comment

sajjad-moradi Aug 21, 2024

Choose a reason for hiding this comment

lnbest0707-uber commented Jul 26, 2024 •

edited

Loading

codecov-commenter commented Jul 27, 2024 •

edited

Loading