Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support empty segments for realtime tables #13362

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tibrewalpratik17
Copy link
Contributor

Addresses #12703

This patch allows persistence of empty segments in case of no messages in realtime ingestion. Right now, it can happen that a segment is stuck in CONSUMING state for months / years if no offset advances for it.
We also add support for retention manager finding empty segments aggressively and purging them.

@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2024

Codecov Report

Attention: Patch coverage is 93.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 61.98%. Comparing base (59551e4) to head (12a7068).
Report is 677 commits behind head on master.

Files Patch % Lines
.../pinot/core/data/manager/BaseTableDataManager.java 50.00% 1 Missing ⚠️
.../starter/helix/HelixInstanceDataManagerConfig.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13362      +/-   ##
============================================
+ Coverage     61.75%   61.98%   +0.23%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2559     +123     
  Lines        133233   141215    +7982     
  Branches      20636    21916    +1280     
============================================
+ Hits          82274    87533    +5259     
- Misses        44911    47025    +2114     
- Partials       6048     6657     +609     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.94% <93.33%> (+0.23%) ⬆️
java-21 61.86% <93.33%> (+0.24%) ⬆️
skip-bytebuffers-false 61.97% <93.33%> (+0.23%) ⬆️
skip-bytebuffers-true 61.83% <93.33%> (+34.10%) ⬆️
temurin 61.98% <93.33%> (+0.23%) ⬆️
unittests 61.98% <93.33%> (+0.23%) ⬆️
unittests1 46.67% <66.66%> (-0.22%) ⬇️
unittests2 27.51% <86.66%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tibrewalpratik17 tibrewalpratik17 force-pushed the support_empty_segments branch from 46469b5 to 5eb16f4 Compare June 11, 2024 12:38
@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review June 11, 2024 16:47
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @sajjad-moradi @mcvsubbu Do you have concern on allowing committing empty segment?

@tibrewalpratik17 tibrewalpratik17 removed the request for review from aaronucsd June 18, 2024 15:14
@mcvsubbu
Copy link
Contributor

Sajjad and I will review this and get back to you. Please hold merging for a bit.

@mcvsubbu
Copy link
Contributor

I believe we already allow empty segments during force commit, so that by itself should not be a problem. @sajjad-moradi ?

@tibrewalpratik17
Copy link
Contributor Author

I believe we already allow empty segments during force commit, so that by itself should not be a problem.

+1!
And we are ensuring here that we don't remove the last committed empty segment for keeping the metadata info so should be pretty safe imo.

@sajjad-moradi
Copy link
Contributor

Committing an empty segment is fine. I'm concerned that deleting the empty ones might make debugging a bit harder. If something goes wrong with segments of a partition, it would be hard by looking at IdealState, ExternalView, or SEGMENTS in PropertyStore to distinguish if a segment was purged due no-data case or something has indeed gone wrong.

@tibrewalpratik17
Copy link
Contributor Author

I'm concerned that deleting the empty ones might make debugging a bit harder. If something goes wrong with segments of a partition, it would be hard by looking at IdealState, ExternalView, or SEGMENTS in PropertyStore to distinguish if a segment was purged due no-data case or something has indeed gone wrong.

In that case, we can enable the empty segment retention manager behind a config: deleteEmptySegments by default we can keep it to false. Ideally keeping a lot of empty segments in a table doesn't make sense if the commit threshold time is few hours and retention is high. Wdyt?

@Jackie-Jiang
Copy link
Contributor

What is the current mechanism of removing empty segments?

@tibrewalpratik17
Copy link
Contributor Author

tibrewalpratik17 commented Jun 20, 2024

What is the current mechanism of removing empty segments?

I believe the time-based retention manager should handle this since we get segment.end.time information in the metadata for empty segments as well. However, a specific scenario for upserts is that if we set indefinite retention and rely on UpsertCompaction to purge segments, empty segments won't be included. This is because we don't add EmptyIndexSegment to _trackedSegments, which ultimately leads to them being skipped in the snapshot flow as well.

@Jackie-Jiang
Copy link
Contributor

Good point on the infinite retention case. I'm okay always removing empty segment as long as it is not the last committed one (used to resume consumption or mark the end of a partition). I think it is okay because removed segments are either expired or empty. I don't see much value keeping all empty segments around even for debugging purpose. @sajjad-moradi wdyt?

@sajjad-moradi
Copy link
Contributor

sajjad-moradi commented Jun 24, 2024

We won't have a lot of segments if we don't commit empty segments. A good thing about the existing behavior is that when we see a consuming segment that's started a long time ago, we can immediately tell that there hasn't been any new events on the stream, without needing for further investigation!

To get around the issue that this PR is trying to fix, can't we just issue a force commit, and then deleted those segments?
I don't like the idea of committing the empty consuming segments and then removing them in another task, retention manager. Retention manager, as its name suggests, should only delete segments that are older than the retention of the table.

If you guys still think it's better to go with the approach suggested in this PR, let's have a config for committing empty consuming segment, and keep the existing behavior behind the default value of that config.

@Jackie-Jiang
Copy link
Contributor

@sajjad-moradi One problem with force commit is that it won't commit the empty segment for the same reason.
I'm okay adding a config to allow committing empty segment, then allow retention manager to remove it if it is not the last committed segment.

@tibrewalpratik17
Copy link
Contributor Author

when we see a consuming segment that's started a long time ago, we can immediately tell that there hasn't been any new events on the stream, without needing for further investigation!

We can monitor various metrics to determine if this scenario occurs after allowing this empty segments behaviour.

I will move this empty segment behaviour behind a config but somehow I am not convinced to not make this default behaviour just because of better debuggability. It's just that overtime we will have to manage a lot of configs for getting the correct functionality out of Pinot.

@tibrewalpratik17 tibrewalpratik17 force-pushed the support_empty_segments branch from 8cff8a6 to 246e376 Compare June 25, 2024 10:49
@tibrewalpratik17 tibrewalpratik17 force-pushed the support_empty_segments branch from 246e376 to 12a7068 Compare June 25, 2024 10:53
}
_retentionStrategies.add(retentionStrategy);
if (TableNameBuilder.isRealtimeTableResource(tableConfig.getTableName())) {
_retentionStrategies.add(new EmptySegmentRetentionStrategy());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a functional here to avoid creating a new class EmptySegmentRetentionStrategy:

Suggested change
_retentionStrategies.add(new EmptySegmentRetentionStrategy());
_retentionStrategies.add((tableName, segZkMetadata) -> segZkMetadata.getTotalDocs() == 0);

return;
}
_retentionStrategies.add(retentionStrategy);
if (TableNameBuilder.isRealtimeTableResource(tableConfig.getTableName())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this behind the config as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking about that but we don't get instanceDataConfig in the retention strategy. I kept this config at a cluster-level for easier rollout.
Either we should then define this config at a table level or pass instanceDataConfig here? Thoughts? cc @Jackie-Jiang

@mcvsubbu
Copy link
Contributor

There are at least a couple of more considerations I can think of. @tibrewalpratik17 can you please follow up on these?

  • The server keeps a rolling statistics buffer on one of the segments of the table that helps in deciding how much memory to allocate for consuming segments. A commit of a zero-sized segment may end up skewing these statistics unfavorably. See ReatimeSegmentStatsHistory.
  • When one segment completes, the controller computes the number of rows to be consumed for the next segment, based on the size of the current segment. Will this computation also get skewed with zero-sized segments? See FlushThresholdUpdater and implementations.

@tibrewalpratik17
Copy link
Contributor Author

Thanks @mcvsubbu for the pointers!

Briefly going through the code, I observe the following:

The server keeps a rolling statistics buffer on one of the segments of the table that helps in deciding how much memory to allocate for consuming segments. A commit of a zero-sized segment may end up skewing these statistics unfavorably. See ReatimeSegmentStatsHistory.

I noticed that we only update the stats when _numRowsIndexed is strictly greater than 0 (Ref). Therefore, the behavior remains the same: if we do not commit an empty segment, the stats history is not updated, and this applies to the current scenario as well.

When one segment completes, the controller computes the number of rows to be consumed for the next segment, based on the size of the current segment. Will this computation also get skewed with zero-sized segments? See FlushThresholdUpdater and implementations.

Reviewing the implementation of SegmentSizeBasedFlushThresholdUpdater, it appears that empty segments are already handled (Ref), as the new threshold is set to the same value as the previous committed segment threshold. In the case of committing consecutive empty segments, if a table's partition has never received a message, it defaults to the autotuneInitialRows configuration value (Ref).

@Jackie-Jiang
Copy link
Contributor

After a second thought, the root cause of the problem is that: retention manager always keeps the last completed segment even if it is already expired. I think we need to fix this behavior because it is violating the retention agreement.
@sajjad-moradi Do you think consumption pause/resume can work if we cleanup the last committed segment? Basically we should resume from the earliest offset
@KKcorps Does Kinesis work if we clean up the last committed segment for a closed partition?

@tibrewalpratik17
Copy link
Contributor Author

retention manager always keeps the last completed segment even if it is already expired. I think we need to fix this behavior because it is violating the retention agreement.

but for very high / indefinite retention we would still end up with the consuming segment open for quite long.

@Jackie-Jiang
Copy link
Contributor

@tibrewalpratik17 Consuming segment will be opened for long time only if there is no event consumed. As long as we can clean up all the expired segments it should be fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants