-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(all): merge apache kafka 3.8.0 771b957 #1813
Conversation
This patch introduces the `group.version` feature flag with one version: 1) Version 1 enables the new consumer group rebalance protocol (KIP-848). Reviewers: Justine Olshan <[email protected]>
This patch updates the system tests to correctly enable the new consumer protocol/coordinator in the tests requiring them. I went with the simplest approach for now. Long term, I think that we should refactor the tests to better handle features and non-production features. I got a successful run of the consumer system tests with this patch combined with apache/kafka#16120: https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1717155071--dajac--KAFKA-16860-2--29028ae0dd/2024-05-31--001./2024-05-31--001./report.html. Reviewers: Justine Olshan <[email protected]>
This patch optimizes uniform (homogenous) assignor by avoiding creating a copy of all the assignments. Instead, the assignor creates a copy only if the assignment is updated. It is a sort of copy-on-write. This change reduces the overhead of the TargetAssignmentBuilder when ran with the uniform (homogenous) assignor. Trunk: ``` Benchmark (memberCount) (partitionsToMemberRatio) (topicCount) Mode Cnt Score Error Units TargetAssignmentBuilderBenchmark.build 10000 10 100 avgt 5 24.535 ± 1.583 ms/op TargetAssignmentBuilderBenchmark.build 10000 10 1000 avgt 5 24.094 ± 0.223 ms/op JMH benchmarks done ``` ``` Benchmark (assignmentType) (assignorType) (isRackAware) (memberCount) (partitionsToMemberRatio) (subscriptionType) (topicCount) Mode Cnt Score Error Units ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 10000 10 HOMOGENEOUS 100 avgt 5 14.697 ± 0.133 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 10000 10 HOMOGENEOUS 1000 avgt 5 15.073 ± 0.135 ms/op JMH benchmarks done ``` Patch: ``` Benchmark (memberCount) (partitionsToMemberRatio) (topicCount) Mode Cnt Score Error Units TargetAssignmentBuilderBenchmark.build 10000 10 100 avgt 5 3.376 ± 0.577 ms/op TargetAssignmentBuilderBenchmark.build 10000 10 1000 avgt 5 3.731 ± 0.359 ms/op JMH benchmarks done ``` ``` Benchmark (assignmentType) (assignorType) (isRackAware) (memberCount) (partitionsToMemberRatio) (subscriptionType) (topicCount) Mode Cnt Score Error Units ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 10000 10 HOMOGENEOUS 100 avgt 5 1.975 ± 0.086 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 10000 10 HOMOGENEOUS 1000 avgt 5 2.026 ± 0.190 ms/op JMH benchmarks done ``` Reviewers: Ritika Reddy <[email protected]>, Jeff Kim <[email protected]>, Justine Olshan <[email protected]>
…gardless of in-flight heartbeats. (#16017) Fix the bug where the heartbeat is not sent when a newly created consumer is immediately closed. When there is a heartbeat request in flight and the consumer is then closed. In the current code, the HeartbeatRequestManager does not correctly send the closing heartbeat because a previous heartbeat request is still in flight. However, the closing heartbeat is only sent once, so in this situation, the broker will not know that the consumer has left the consumer group until the consumer's heartbeat times out. This situation causes the broker to wait until the consumer's heartbeat times out before triggering a consumer group rebalance, which in turn affects message consumption. Reviewers: Lianet Magrans <[email protected]>, Chia-Ping Tsai <[email protected]>
…945) When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend the broker registration, so that the controller can record the storage directories. The current code for doing this has several problems, however. One is that it tends to trigger even in cases where we don't actually need it. Another is that when re-registering the broker, the broker is marked as fenced. This PR moves the handling of the re-registration case out of BrokerMetadataPublisher and into BrokerRegistrationTracker. The re-registration code there will only trigger in the case where the broker sees an existing registration for itself with no directories set. This is much more targetted than the original code. Additionally, in ClusterControlManager, when re-registering the same broker, we now preserve its fencing and shutdown state, rather than clearing those. (There isn't any good reason re-registering the same broker should clear these things... this was purely an oversight.) Note that we can tell the broker is "the same" because it has the same IncarnationId. Reviewers: Gaurav Narula <[email protected]>, Igor Soarez <[email protected]>
…d topics having empty partitions (#16042) Reviewers: Chia-Ping Tsai <[email protected]>
…I spec (#16175) Reviewers: Chris Egerton <[email protected]>
This patch is a small refactoring which mainly aims at avoid to construct a copy of the new target assignment in the TargetAssignmentBuilder because the copy is not used by the caller. The change relies on the exiting tests and it does not really have an impact on performance (e.g. validated with TargetAssignmentBuilderBenchmark). Reviewers: Chia-Ping Tsai <[email protected]>
…than group max size. (#16163) Fix the bug where the group downgrade to a classic one when a member leaves, even though the consumer group size is still larger than `classicGroupMaxSize`. Reviewers: Chia-Ping Tsai <[email protected]>, David Jacot <[email protected]>
…#15165) Reviewers: Greg Harris <[email protected]>, Luke Chen <[email protected]>, Kamal Chandraprakash <[email protected]>
This patch fixes a typo in MetadataVersion.IBP_4_0_IV0. It should be 0 not O. Reviewers: Justine Olshan <[email protected]>, Jun Rao <[email protected]>, Chia-Ping Tsai <[email protected]>
We have revamped the thread idle ratio metric in apache/kafka#15835. apache/kafka#15835 (comment) describes a case where the metric loses accuracy and in order to set a lower bound to the accuracy, this patch re-adds a poll with a timeout that was removed as part of apache/kafka#15430. Reviewers: David Jacot <[email protected]>
…ers (#16151) Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers, or options.timeoutMs if specified, as transaction timeout. No transaction will be started with this timeout, but ReplicaManager.appendRecords uses this value as its timeout. Use REQUEST_TIMEOUT_MS_CONFIG like a regular producer append to allow for replication to take place. Co-Authored-By: Adrian Preston <[email protected]>
… and compare raw task configs before publishing them (#16122) Reviewers: Mickael Maison <[email protected]>
The time taken to compute a new assignment is critical. This patches extending the existing logging to log it too. This is very useful information to have. Reviewers: Luke Chen <[email protected]>
This patch reworks the `PartitionAssignor` interface to use interfaces instead of POJOs. It mainly introduces the `MemberSubscriptionSpec` interface that represents a member subscription and changes the `GroupSpec` interfaces to expose the subscriptions and the assignments via different methods. The patch does not change the performance. before: ``` Benchmark (memberCount) (partitionsToMemberRatio) (topicCount) Mode Cnt Score Error Units TargetAssignmentBuilderBenchmark.build 10000 10 100 avgt 5 3.462 ± 0.687 ms/op TargetAssignmentBuilderBenchmark.build 10000 10 1000 avgt 5 3.626 ± 0.412 ms/op JMH benchmarks done ``` after: ``` Benchmark (memberCount) (partitionsToMemberRatio) (topicCount) Mode Cnt Score Error Units TargetAssignmentBuilderBenchmark.build 10000 10 100 avgt 5 3.677 ± 0.683 ms/op TargetAssignmentBuilderBenchmark.build 10000 10 1000 avgt 5 3.991 ± 0.065 ms/op JMH benchmarks done ``` Reviewers: David Jacot <[email protected]>
When PartitionRegistration#merge() reads a PartitionChangeRecord from an older MetadataVersion, with a replica assignment change and without #directories() set, it produces a direcotry assignment of DirectoryId.UNASSIGNED. This is problematic because the MetadataVersion may not yet support directory assignments, leading to a UnwritableMetadataException in PartitionRegistration#toRecord. Since the Controller always sets directories on PartitionChangeRecord if the MetadataVersion supports it, via PartitionChangeBuilder, there's no need for PartitionRegistration#merge() to populate directories upon a replica assignment change. Reviewers: Luke Chen <[email protected]>
Allow KRaft replicas to send requests to any node (Node) not just the nodes configured in the controller.quorum.voters property. This flexibility is needed so KRaft can implement the controller.quorum.voters configuration, send request to the dynamically changing set of voters and send request to the leader endpoint (Node) discovered through the KRaft RPCs (specially BeginQuorumEpoch request and Fetch response). This was achieved by changing the RequestManager API to accept Node instead of just the replica ID. Internally, the request manager tracks connection state using the Node.idString method to match the connection management used by NetworkClient. The API for RequestManager is also changed so that the ConnectState class is not exposed in the API. This allows the request manager to reclaim heap memory for any connection that is ready. The NetworkChannel was updated to receive the endpoint information (Node) through the outbound raft request (RaftRequent.Outbound). This makes the network channel more flexible as it doesn't need to be configured with the list of all possible endpoints. RaftRequest.Outbound and RaftResponse.Inbound were updated to include the remote node instead of just the remote id. The follower state tracked by KRaft replicas was updated to include both the leader id and the leader's endpoint (Node). In this comment the node value is computed from the set of voters. In future commit this will be updated so that it is sent through KRaft RPCs. For example BeginQuorumEpoch request and Fetch response. Support for configuring controller.quorum.bootstrap.servers was added. This includes changes to KafkaConfig, QuorumConfig, etc. All of the tests using QuorumTestHarness were changed to use the controller.quorum.bootstrap.servers instead of the controller.quorum.voters for the broker configuration. Finally, the node id for the bootstrap server will be decreasing negative numbers starting with -2. Reviewers: Jason Gustafson <[email protected]>, Luke Chen <[email protected]>, Colin P. McCabe <[email protected]>
…thods (#16197) Reviewers: Greg Harris <[email protected]>, Chia-Ping Tsai <[email protected]>
… task until the shutdown timer is expired. (#16156) Reviewers: Lianet Magrans <[email protected]>, Chia-Ping Tsai <[email protected]>
…ompatibility with classic protocol members (#16145) During online migration, there could be ConsumerGroup that has members that uses the classic protocol. In the current implementation, `STALE_MEMBER_EPOCH` could be thrown in ConsumerGroup offset fetch/commit validation but it's not supported by the classic protocol. Thus this patch changed `ConsumerGroup#validateOffsetCommit` and `ConsumerGroup#validateOffsetFetch` to ensure compatibility. Reviewers: Jeff Kim <[email protected]>, David Jacot <[email protected]>
…riesAreAvailable and StorageToolTest.testFormatEmptyDirectory (#16186) Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]>
…mote storage (#16071) Reviewers: Kamal Chandraprakash<[email protected]>, Luke Chen <[email protected]>, Satish Duggana <[email protected]>
…schemas without inner schemas (#16161) Signed-off-by: Greg Harris <[email protected]> Reviewers: Chris Egerton <[email protected]>
Reviewers: Matthias J. Sax <[email protected]>, Bruno Cadonna <[email protected]>, Anna Sophie Blee-Goldman <[email protected]>
FK left-join was changed via KIP-962. This PR updates the docs accordingly. Reviewers: Ayoub Omari <[email protected]>, Matthias J. Sax <[email protected]>
…ing (#16147) This PR takes care of making the call back toTaskAssignor.onAssignmentComputed. It also contains a change to the public AssignmentConfigs API, as well as some simplifications of the StickyTaskAssignor. This PR also changes the rack information fetching to happen lazily in the case where the TaskAssignor makes its decisions without said rack information. Reviewers: Anna Sophie Blee-Goldman <[email protected]>
Fixed the calculation of the store name list based on the subtopology being accessed. Also added a new test to make sure this new functionality works as intended. Reviewers: Anna Sophie Blee-Goldman <[email protected]>
…missing (#16165) When starting up kafka logManager, we'll check stray replicas to avoid some corner cases. But this check might cause broker unable to startup if partition.metadata is missing because when startup kafka, we load log from file, and the topicId of the log is coming from partition.metadata file. So, if partition.metadata is missing, the topicId will be None, and the LogManager#isStrayKraftReplica will fail with no topicID error. The partition.metadata missing could be some storage failure, or another possible path is unclean shutdown after topic is created in the replica, but before data is flushed into partition.metadata file. This is possible because we do the flush in async way here. When finding a log without topicID, we should treat it as a stray log and then delete it. Reviewers: Luke Chen <[email protected]>, Gaurav Narula <[email protected]>
…is in the voter set (#16079) 1. Changing log message from error to info - We may expect the HW calculation to give us a smaller result than the current HW in the case of quorum reconfiguration. We will continue to not allow the HW to actually decrease. 2. Logic for finding the updated LeaderEndOffset for updateReplicaState is changed as well. We do not assume the leader is in the voter set and check the observer states as well. 3. updateLocalState now accepts an additional "lastVoterSet" param which allows us to update the leader state with the last known voters. any nodes in this set but not in voterStates will be added to voterStates and removed from observerStates, any nodes not in this set but in voterStates will be removed from voterStates and added to observerStates Reviewers: Luke Chen <[email protected]>, José Armando García Sancio <[email protected]>
…s group protocol config (#16409) This reverts commit e95e91a. With the change to include the group.version flag, these tests fail due to trying to set the feature for the old version. It is unclear if these tests originally worked as intended and given the upgrade is not expected for 3.8, we will just revert from 3.8. Reviewers: David Jacot <[email protected]>
All public interface changes should be briefly mentioned in the upgrade guide. Reviewers: Matthias J. Sax <[email protected]>, Anna Sophie Blee-Goldman <[email protected]>, Nick Telford <[email protected]>
Reverting due to complications when trying to fix KAFKA-17011 in 3.8. Now there will be no production features, so we won't send any over the wire in ApiVersions or BrokerRegistration and cause issues when the receiver is on an old version. I reverted the typo PR to make the reverts cleaner and minimize chances for errors. The only conflicts were due to imports and a modified test testConsumerGroupDescribe. The fix was to keep the modified parameters but remove the metadataCache code. Some other minor changes for items we wanted to keep (not revert) Reviewers: Colin P. McCabe <[email protected]>, David Jacot <[email protected]>, Chia-Ping Tsai <[email protected]>, Jun Rao <[email protected]>
…6400) * Revert "KAFKA-16154: Broker returns offset for LATEST_TIERED_TIMESTAMP (#15213)" * Set 3.8_IV0 as latest production version in 3.8 * Bring changes committed in KAFKA-16968 * Make ClusterTest annotation metadata default to 3.9 --------- Signed-off-by: Josep Prat <[email protected]> Reviewers: Justine Olshan <[email protected]>, Jun Rao <[email protected]>
Reviewers: Burno Cadonna <[email protected]>
…16533) LATEST_PRODUCTION version in MetadataVersion.java was updated in both #16347 and #16400, but it was left unchanged in the system tests. Reviewers: Josep Prat <[email protected]>
…16545) With enabled state updater tasks that are created but not initialized are stored in a set. In each poll iteration the stream thread drains that set, intializes the tasks, and adds them to the state updater. On partition lost, all active tasks are closed. This commit ensures that active tasks pending initialization in the set mentioned above are closed cleanly on partition lost. Reviewer: Lucas Brutschy <[email protected]>
…#16556) Reviewers: Matthias J. Sax <[email protected]>
…try (#16555) When a active tasks are revoked they land as suspended tasks in the task registry. If they are then reassigned, the tasks are resumed and put into restoration. On assignment, we first handle the tasks in the task registry and then the tasks in the state updater. That means that if a task is re-assigned after a revocation, we remove the suspended task from the task registry, resume it, add it to the state updater, and then remove it from the list of tasks to create. After that we iterate over the tasks in the state updater and remove from there the tasks that are not in the list of tasks to create. However, now the state updater contains the resumed tasks that we removed from the task registry before but are no more in the list of tasks to create. In other words, we remove the resumed tasks from the state updater and close them although we just got them assigned. This commit ensures that we first handle the tasks in the state updater and then the tasks in the task registry. Reviewer: Lucas Brutschy <[email protected]>
… and JsonDeserializer (#16565) Reviewers: Mickael Maison <[email protected]>, Josep Prat <[email protected]>, Greg Harris <[email protected]>
… (#16570) When Streams tries to transit a restored active task to RUNNING, the first thing it does is getting the committed offsets for this task. If getting the offsets expires a timeout, Streams does not re-throw the error initially, but tries to get the committed offsets later until a Streams-specific timeout is hit. Restored active tasks from the state updater are removed from the output queue of the restored tasks in the state updater. If a timeout occurs, the restored task is neither added to the task registry nor re-added to the state updater. The task is lost since it is not maintained anywhere. This means the task is also not closed. When the same task is created again on the same stream thread since the stream thread does not know about this lost task, the state stores are opened again and RocksDB will throw the "No locks available" error. This commit re-adds the task to the state updater if the committed request times out. Reviewers: Lucas Brutschy <[email protected]>
…eTopics (#16217) Reviewers: Andrew Schofield <[email protected]>, Lianet Magrans <[email protected]>, David Arthur <[email protected]>
Fix to complete Future which was stuck due the exception.getCause() throws an error. The fix in the #16217 unblocked blocking thread but exception in catch block from blocking get call was wrapped in ExecutionException which is not the case when moved to async workflow hence getCause is not needed. Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]>
…aliases correctly (#16608) Signed-off-by: Greg Harris <[email protected]> Reviewers: Chris Egerton <[email protected]>, Chia-Ping Tsai <[email protected]>, Josep Prat <[email protected]>
…ool (#16607) (3.8) (#16617) Reviewers: Chia-Ping Tsai <[email protected]>, Greg Harris <[email protected]>, Chris Egerton <[email protected]> Co-authored-by: Dmitry Werner <[email protected]>
…ics (#16614) Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]>
…ryMetrics (#16641) Reviewers: Okada Haruki <[email protected]>, Chia-Ping Tsai <[email protected]>
Signed-off-by: Robin Han <[email protected]>
Signed-off-by: Robin Han <[email protected]>
|
Signed-off-by: Robin Han <[email protected]>
Main Review Files: