Add delay before the peon drops the segments after publishing them #15373

kaisun2000 · 2023-11-14T23:39:01Z

Description

Currently in the realtime ingestion (Kafka/Kinesis) case, after publishing the segments, upon acknowledgement from the coordinator that the segments are already placed in some historicals, the peon would unannounce the segments (basically saying the segments are not in this peon anymore to the whole cluster) and drop the segments from cache and sink timeline in one shot.

The in transit queries from the brokers that still thinks the segments are in the peon can get a NullPointer exception when the peon is unsetting the hydrants in the sinks.

The fix would let the peon to wait for a configurable delay period before dropping segments, remove segments from cache etc after the peon unannounce the segments.

This delayed approach is similar to how the historicals handle segments moving out.

See the details of discussion in Apache slack channel here

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Release note

Key changed/added classes in this PR

MyFoo
OurBar
TheirBaz

This PR has:

This is similar to make sure in transit query seeing the segment to be dropped in the peon would not fail with a NullPtr exception. The delay approach is similar to the strategy the historical handles segment moving out.

abhishekagarwal87 · 2023-11-20T09:39:29Z

@kaisun2000 - How does this compare to #15260?

gianm · 2023-11-20T18:52:12Z

We had some discussion of this in Slack. In summary, #15260 fixes an NPE that occurs when a query comes in for a segment that has just been unloaded. This PR adds a delay between unannounce and unload, which reduces the likelihood that a query will come in for a segment that has just been unloaded. Both changes are good IMO— we want this case to be rare (which this patch helps with), but when it does happen it needs to be handled properly (which #15260 helps with).

With this patch, the realtime tasks would behave more similarly to Historicals, which do already have a delay between unannouncing and unloading of 30 seconds (the druid.segmentCache.dropSegmentDelayMillis).

kaisun2000 · 2023-11-20T19:05:41Z

@abhishekagarwal87,

The fix of #15260 from @gianm is in the query path to check the potential NullPtr pointer condition and then avoid them by returning to missing segment error to the broker. The broke seeing the missing segment would retry.

My fix is in the ingestion path to add a configurable delay in the segment handoff phase. More specifically after the peon gets the acknowledgement from the coordinator that the segments in this handoff batch are already placed in some historicals, the peon would wait for some time before the peon removes the segments from its timeline, resets the hydrants and drops the segments and the files backing the segments up in the file system. Actually as @gianm pointed it out, this pattern of racing was not just in the peon side, it is also in the historical data segment moving side. And the approach taken in historical side is also adding a configurable delay. Thus adding a configurable delay is a "good idea" in @gianm's own words.

Thinking about this race in a little bit more abstract way in both peon and historical cases, this race issue is caused because it would take some time for the segments placement change to be reflected (via Zookeeper) in the broker side. The in-transit queries may actually hit a data server (peon or historical) that already moved out the segment to be queried. The delay to drop the segment is to compensate the propagation time for the broker to see the most up-to-date placement of the segment, while in the mean time, the in-transit query missing the segment case would be eliminated.

So I would say the two approaches complement each other. The ingestion delay fix has the benefit of not causing unnecessary retry query load.

See the details of discussion in Apache slack channel here

abhishekagarwal87 · 2023-11-28T10:08:00Z

Thanks @gianm and @kaisun2000, for the context. One last question: does it need to be configurable? I don't think so. We can either hardcode it to 30 seconds or we can reuse the same config option that already exists.

druid.segmentCache.dropSegmentDelayMillis

There is no need of a per ingestion configuration here.

kaisun2000 · 2023-11-28T22:03:14Z

Thanks @gianm and @kaisun2000, for the context. One last question: does it need to be configurable? I don't think so. We can either hardcode it to 30 seconds or we can reuse the same config option that already exists.
druid.segmentCache.dropSegmentDelayMillis
There is no need of a per ingestion configuration here.

@abhishekagarwal87, I am pretty open to any suggestions.

As you said, maybe a per Peon/middlemanager configuration is just as good. However, one peon takes one ingestion spec and that is why I put it in ingestion spec.

Can you supply another example of per middle manager configuration which will be carried to peons? I can also change the configuration that way.

kaisun2000 · 2023-12-01T18:34:55Z

ping @abhishekagarwal87

abhishekagarwal87 · 2023-12-02T03:32:35Z

Please reuse the same runtime property instead of adding a new one to the task spec

…

On Sat, 2 Dec 2023 at 12:05 AM, kaisun2000 ***@***.***> wrote: ping @abhishekagarwal87 <https://github.com/abhishekagarwal87> — Reply to this email directly, view it on GitHub <#15373 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALIWUOBUT5XMGELTLKN63DYHIPNTAVCNFSM6AAAAAA7LTZIIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZWGU4TGMJZGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kaisun2000 · 2023-12-05T00:37:37Z

@abhishekagarwal87 I did some careful examination of peon config in order to achieve the goal -- avoid dropping delay config to be task spec specific. And here is the proposal.

First, the historical side delay config is here in SegmentLoaderConfig. This can't be used in the peon/middlemanager path as peon does not have SegmentLoader. Peon generates segments by itself and then ships them out to deep storage and other components

Thus, it looks like we need to introduce a similar config in the peon/middlemanager side. The proper place seems to be in the peon additional config . In the code, it is in the class TaskConfig.

In the Peon code side, the TaskConfig is bound in thebindTaskConfigAndClients method. It would be passed into TaskToolboxFactory class. Eventually, it would be inside of TaskToolbox class.

Then in runtime when the peon starts, ExecutorLifecycle would have its taskRunner of type SingleTaskBackgroundRunner to run the task. SingleTaskBackgroundRunner is injected with a copy of TaskToolboxFactory. It would delegate to the task to create a runner ('SeekableStreamIndexTaskRunner') to execute the ingestion logic. The nice thing is that the toolbox created by TaskToolboxFactory is passed along. Here the toolbox still have the TaskConfig.

The 'SeekableStreamIndexTaskRunner' would create the appenderator via task of type SeekableStreamIndexTask in its ingestion logic. So here, we can add the configuration of delay from TaskConfig in the toolbox as an instance variable to appenderator.

This seems to achieve the goal of adding a dropping delay config to peon without the config to be task spec specific.

Let me know if this proposal looks good? I will try to make the change accordingly next unless I hear it otherwise.

abhishekagarwal87 · 2023-12-05T06:52:45Z

@kaisun2000 - FWIW a peon can have segment loader config when it loads broadcast segments. I think we can use the same property name since the purpose is really just the same and we add the relevant documentation. Property name aside, does it really need to be added to TaskConfig? can it be accessed directly in same way as druid.indexer.fork.property.druid.processing.numMergeBuffers is accessed?

kaisun2000 · 2023-12-05T20:10:03Z

@kaisun2000 - FWIW a peon can have segment loader config when it loads broadcast segments. I think we can use the same property name since the purpose is really just the same and we add the relevant documentation. Property name aside, does it really need to be added to TaskConfig? can it be accessed directly in same way as druid.indexer.fork.property.druid.processing.numMergeBuffers is accessed?

@abhishekagarwal87, thanks for this further illustration.
Indeed, I see that StorageNodeModule is a base module that the CliPeon would use via Initialization:makeInjectorWithModules(). Since StorageNodeModule has the binding to SegmentLoaderConfig. We also have the druid.segmentCache.dropSegmentDelayMillis in Peon, as you said.

In this case, shall we add a SegmentLoaderConifg to the TaskToolboxFactory constructor?

@Inject
  public TaskToolboxFactory(
      SegmentLoaderConifg segmentLoadConfig,  ---> added 
      TaskConfig config,

Then, later, when the StreamAppenderator is created, we can pass in the SegmentLoaderConfig in its constructor?

Let me know if this sounds good? If so, I will make the change accordingly.

…gmentDelay-peon-path

… on feedback; will use another approach

… path

kaisun2000 · 2023-12-12T01:23:19Z

Revised the PR , using SegmentLoaderConfig.dropSegmentDelayMillis to control segment drop delay in the realtime path similar to the historical segment drop path. The following are the overview of class changes:

TaskToolboxFactory is injected with an instance of SegmentLoaderConfig
'TaskToolbox' built by TaskToolboxFactory would store a copy of SegmentLoaderConfig.
In the appenderator creation path, changes are made accordingly so that the StreamAppenderator would have a copy of SegmentLoaderConfig from the 'TaskToolbox'
The delay to drop logic is implemented in StreamAppenderator.

@abhishekagarwal87, can we have another review to see if this approach reflects our discussion here?

server/src/main/java/org/apache/druid/segment/realtime/appenderator/StreamAppenderator.java

kaisun2000 · 2023-12-14T21:33:44Z

@abhishekagarwal87, ping?

abhishekagarwal87 · 2023-12-19T06:07:22Z

Overall LGTM. I just had a question.

server/src/main/java/org/apache/druid/segment/realtime/appenderator/StreamAppenderator.java

…fig issue

…dConfig

would avoid blocking test case infinitely if something goes unexpected.

kaisun2000 · 2023-12-29T10:21:23Z

ping?

abhishekagarwal87 · 2024-01-02T05:38:41Z

@kaisun2000 - Merged. Thank you for your contribution.

kaisun2000 · 2024-01-02T18:59:44Z

@abhishekagarwal87, appreciate your effort here and also happy new year you.

Add delay before the poen drops segements after publishing stage

66560ae

This is similar to make sure in transit query seeing the segment to be dropped in the peon would not fail with a NullPtr exception. The delay approach is similar to the strategy the historical handles segment moving out.

github-actions bot added Area - Streaming Ingestion Area - Ingestion labels Nov 14, 2023

kaisun2000 changed the title ~~Add delay before the peon drops segments after publishing them~~ Add delay before the peon drops the segments after publishing them Nov 14, 2023

Kai Sun added 3 commits December 6, 2023 16:06

Merge remote-tracking branch 'upstream/master' into bugfix-add-dropSe…

3994884

…gmentDelay-peon-path

revert the approaches to add the delay config in ingestion spec based…

bd66f72

… on feedback; will use another approach

add the delay config from SegmentLoaderConfig, the same as historical…

0ec32fa

… path

github-actions bot removed the Area - Streaming Ingestion label Dec 12, 2023

github-advanced-security bot found potential problems Dec 12, 2023

View reviewed changes

server/src/main/java/org/apache/druid/segment/realtime/appenderator/StreamAppenderator.java Fixed Show resolved Hide resolved

abhishekagarwal87 reviewed Dec 19, 2023

View reviewed changes

server/src/main/java/org/apache/druid/segment/realtime/appenderator/StreamAppenderator.java Show resolved Hide resolved

Kai Sun added 7 commits December 18, 2023 23:58

add shutdown for exec thread pool and address the null segmentloadcon…

bbcb70f

…fig issue

added test cases

0e77531

make sure drop config is passed to toolbox

a012a8a

added one more test to cover the TaskToolbox creation with segmentLoa…

3fe0a26

…dConfig

fix code style check issue and intellij check

e8f4af9

as suggested, added a custom drop executor to make test robust

de46a1a

remove the unused imports

740e32b

Enhance test by giving a time bound for waiting the drop future. This

711e820

would avoid blocking test case infinitely if something goes unexpected.

abhishekagarwal87 approved these changes Jan 2, 2024

View reviewed changes

abhishekagarwal87 merged commit a5e9b14 into apache:master Jan 2, 2024
82 of 83 checks passed

LakshSingla added this to the 29.0.0 milestone Jan 29, 2024

LakshSingla mentioned this pull request Feb 13, 2024

[DRAFT] 29.0.0 release notes #15896

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add delay before the peon drops the segments after publishing them #15373

Add delay before the peon drops the segments after publishing them #15373

kaisun2000 commented Nov 14, 2023 •

edited

Loading

abhishekagarwal87 commented Nov 20, 2023

gianm commented Nov 20, 2023

kaisun2000 commented Nov 20, 2023 •

edited

Loading

abhishekagarwal87 commented Nov 28, 2023

kaisun2000 commented Nov 28, 2023

kaisun2000 commented Dec 1, 2023

abhishekagarwal87 commented Dec 2, 2023 via email

kaisun2000 commented Dec 5, 2023

abhishekagarwal87 commented Dec 5, 2023

kaisun2000 commented Dec 5, 2023

kaisun2000 commented Dec 12, 2023

kaisun2000 commented Dec 14, 2023

abhishekagarwal87 commented Dec 19, 2023

kaisun2000 commented Dec 29, 2023

abhishekagarwal87 commented Jan 2, 2024

kaisun2000 commented Jan 2, 2024

Add delay before the peon drops the segments after publishing them #15373

Add delay before the peon drops the segments after publishing them #15373

Conversation

kaisun2000 commented Nov 14, 2023 • edited Loading

Description

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Release note

Key changed/added classes in this PR

abhishekagarwal87 commented Nov 20, 2023

gianm commented Nov 20, 2023

kaisun2000 commented Nov 20, 2023 • edited Loading

abhishekagarwal87 commented Nov 28, 2023

kaisun2000 commented Nov 28, 2023

kaisun2000 commented Dec 1, 2023

abhishekagarwal87 commented Dec 2, 2023 via email

kaisun2000 commented Dec 5, 2023

abhishekagarwal87 commented Dec 5, 2023

kaisun2000 commented Dec 5, 2023

kaisun2000 commented Dec 12, 2023

kaisun2000 commented Dec 14, 2023

abhishekagarwal87 commented Dec 19, 2023

kaisun2000 commented Dec 29, 2023

abhishekagarwal87 commented Jan 2, 2024

kaisun2000 commented Jan 2, 2024

kaisun2000 commented Nov 14, 2023 •

edited

Loading

kaisun2000 commented Nov 20, 2023 •

edited

Loading