-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ingestion delay configurable: with concurrency fixes #14142
base: master
Are you sure you want to change the base?
Make ingestion delay configurable: with concurrency fixes #14142
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14142 +/- ##
=============================================
- Coverage 61.75% 34.18% -27.57%
- Complexity 207 765 +558
=============================================
Files 2436 2660 +224
Lines 133233 145739 +12506
Branches 20636 22297 +1661
=============================================
- Hits 82274 49826 -32448
- Misses 44911 91894 +46983
+ Partials 6048 4019 -2029
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/IngestionDelayTracker.java
Show resolved
Hide resolved
private final Map<Integer, Long> _partitionsMarkedForVerification = new ConcurrentHashMap<>(); | ||
|
||
private final Cache<String, Boolean> _segmentsToIgnore = | ||
CacheBuilder.newBuilder().expireAfterAccess(IGNORED_SEGMENT_CACHE_TIME_MINUTES, TimeUnit.MINUTES).build(); | ||
|
||
// TODO: Make thread pool a server/cluster level config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we should keep this TODO
volatile StreamPartitionMsgOffset _latestOffset; | ||
final StreamMetadataProvider _streamMetadataProvider; | ||
|
||
IngestionInfo(@Nullable Long ingestionTimeMs, @Nullable Long firstStreamIngestionTimeMs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are ingestionTimeMs
or firstStreamIngestionTimeMs
ever null
?
|
||
IngestionInfo(@Nullable Long ingestionTimeMs, @Nullable Long firstStreamIngestionTimeMs, | ||
@Nullable StreamPartitionMsgOffset currentOffset, | ||
@Nullable StreamMetadataProvider streamMetadataProvider) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
streamMetadataProvider
is also always non-null, and we should be able to remove several unnecessary null checks
volatile Long _firstStreamIngestionTimeMs; | ||
volatile StreamPartitionMsgOffset _currentOffset; | ||
volatile StreamPartitionMsgOffset _latestOffset; | ||
final StreamMetadataProvider _streamMetadataProvider; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Suggest moving this to the top since it is final
StreamPartitionMsgOffset currentOffset = ingestionInfo._currentOffset; | ||
StreamPartitionMsgOffset latestOffset = ingestionInfo._latestOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(MAJOR) This is almost never accurate because current offset and latest offset are updated separately. We probably should directly track the lag and update the lag when fetching the latest offset.
Seems currentOffset
is used only here, so we can just remove both offsets
and only keep lag
in IngestionInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the issue, that we can get the currentOffset only via the updateIngestionMetric method call
and the latest offset we are calculating in seperate periodic thread
if I just want to store the lag, i'll either have to calculate latestOffset every time updateIngestionMetrics
is called with currentOffset for the partition. I can't do this since it'll defeat the purpose of this PR.
throws RuntimeException { | ||
_serverMetrics = serverMetrics; | ||
_tableNameWithType = tableNameWithType; | ||
_metricName = tableNameWithType; | ||
_realTimeTableDataManager = realtimeTableDataManager; | ||
_clock = Clock.systemUTC(); | ||
_isServerReadyToServeQueries = isServerReadyToServeQueries; | ||
|
||
StreamConfig streamConfig = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should directly pass in StreamConfig
to avoid parsing it multiple times
@swaminathanmanish Please help review this |
85de7c9
to
6ccff15
Compare
Before the PR, the logic is:
This PR:
To solve the above problem:
|
Alternatively, we can actually directly calculate the lag in the |
…an configured period
@Jackie-Jiang went with the first alternative |
I kind of prefer the second way, so that we don't need to manage metadata fetch within |
We ran into some errors with previous version #14074 and hence it was reverted with #14127 . Here's the new PR with multiple fixes after the revert
The ingestion offset lag metric currently makes a request to kafka broker to get the latest offset on every update.
However, this has lead to an increasing amount of load on kafka brokers for multiple users.
The PR adds a way to enable/disable this metric and also configure its interval using the following properties in the server or cluster configs