Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #2590 #2685

Merged
merged 0 commits into from
Jun 13, 2023
Merged

Fix for issue #2590 #2685

merged 0 commits into from
Jun 13, 2023

Conversation

thperchi
Copy link

Hello,

I hope this message finds you well. I am writing to submit a pull request for the spring-kafka project to address a fix for the issue #2590.

Here are the details of the pull request:

Repository: https://github.com/INTM-Group/spring-kafka
Issue: Consumer committed offsets not tracking LSO with FixTxOffsets and rollbacked transactions #2590

It has successfully resolved the issue. The changes I made adhere to the existing codebase's style and best practices.

I kindly request you to review and consider merging this pull request into the main branch. I am open to any feedback or suggestions you may have, and I am committed to addressing them promptly.

Thank you for your time and consideration. I appreciate the opportunity to contribute to the spring-kafka project.

Thibault Perchicot

@pivotal-cla
Copy link

@thperchi Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

@garyrussell
Copy link
Contributor

Thanks; but we can't look at your PR until you "sign" the CLA.

Do you think it would be possible to create a test that exhibits the behavior without the patch and passes with it?

@pivotal-cla
Copy link

@thperchi Thank you for signing the Contributor License Agreement!

@thperchi
Copy link
Author

Hi !
It should be good for the CLA.
Here is a test that fail without the pach and passes with it : spring-kafka-2590-repro-v2.zip
Thanks for your time.

Copy link
Member

@artembilan artembilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your effort, but we don't accept changes without tests confirmation in the project code.
Doesn't look like your change is that minor to make it obvious. Plus I see there is a failing test in GH action for this PR:

TransactionalContainerTests > testRollbackRecord() FAILED
    org.opentest4j.AssertionFailedError: 
    expected: null
     but was: OffsetAndMetadata{offset=0, leaderEpoch=null, metadata=''}
        at [email protected]/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at [email protected]/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at [email protected]/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at [email protected]/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at app//org.springframework.kafka.listener.TransactionalContainerTests.testRollbackRecord(TransactionalContainerTests.java:568)

Some way it feels like this failure is related to your change since both are talking about transactions.

Thanks for understanding!

@garyrussell
Copy link
Contributor

To run the tests locally, use ./gradlew clean check.

@thperchi
Copy link
Author

Thank you for the review, I'm gonna check out how to solve this !

long position = this.consumer.position(tp);
Long saved = this.savedPositions.get(tp);
OffsetAndMetadata comitted = this.lastCommits.get(tp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.lastCommits only contains offsets for TopicPartitions for which records were obtained during the previous poll() call. this.lastCommits is cleared between polls.
This means:

  • polls returning no records will always trigger a commit for the complete assignment.
  • polls returning records for some partitions only will trigger commits for the other partitions even when the LSO did not change since the last commit.

Should the fix try to avoid unnecessary commits or is it good enough as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgK I guess there is a situation where a producer might keep rolling back transactions and the lag will increase until a successful transaction appears in the log.

I must admit that I am not comfortable doing a commit for each empty poll but if the above is true, then there is probably no choice.

I really wish the kafka folks would fix the underlying problem of reporting this bogus lag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that I am not comfortable doing a commit for each empty poll but if the above is true, then there is probably no choice.

Me neither. What I was hinting at with this comment is whether Spring-Kafka should remember the last committed offsets across polls to avoid redundant commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but as I described, I believe there are cases where the lag can still increase unless we do.

I believe the proper fix is for Kafka to not report a lag if it's a bogus lag due to uncommitted, or rolled-back, records.

Copy link
Contributor

@bgK bgK Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but as I described, I believe there are cases where the lag can still increase unless we do.

I agree Spring-Kafka should commit the offsets when the consumer position changed after empty polls, to account for rollbacked transactions. What's unclear to me is what committing the same offsets multiple times when the position does not change after empty polls would accomplish. Do you have any insight?

I believe the proper fix is for Kafka to not report a lag if it's a bogus lag due to uncommitted, or rolled-back, records.

As far as I know, at this point, Kafka does not provide any other mechanism to report the consumer lag accurately when using transactional producers. Shouldn't Spring-Kafka make sure the situations where Kafka reports the lag properly work when using the Spring abstractions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't Spring-Kafka make sure the situations where Kafka reports the lag properly

My point is that the so-called "lag" is not really a lag, and shouldn't be reported as such.

@alograg
Copy link

alograg commented Jun 12, 2023

Hi @garyrussell
I will participate in the correction.
I understand clearly that the "lag" does not exist but Kafka sends it as such, because it is a bugus lag.
And based on your experience it is better that be fixed in Kafka.
Look for such a report in KAfka JIRA and i only find the report KAFKA-10683 - Consumer.position() Ignores Transaction Marker with read_uncommitted.
Do you have the report number for this behavior?
I plan to correct this problem from both sides.
I will start generating the tests that show this error/behavior.

@alograg alograg force-pushed the main branch 2 times, most recently from efa1472 to 2b4b570 Compare June 12, 2023 08:20
@garyrussell
Copy link
Contributor

Yes; that is the issue that I raised.

@artembilan artembilan merged commit 2b4b570 into spring-projects:main Jun 13, 2023
@alograg
Copy link

alograg commented Jul 17, 2023

Dear @garyrussell

I hope finds you well. I wanted to provide you with an update regarding the known bug we discussed in the Kafka developers' mailing list. After interacting with the community, waiting for additional input, and reviewing the code, it appears that this is a known bug without a feasible solution due to the underlying handling logic.

This issue is similar to other processes, such as Log Compaction. Besaids, the community is currently exploring a proposal to refactor the KafkaConsumer.

I was wondering if you have any preferred channel where we can further discuss this matter. I would greatly appreciate your insights and expertise in finding ways to improve the code and document workarounds for this issue.

Please let me know your thoughts and if there is a suitable platform or forum where we can continue this discussion. I look forward to collaborating with you to overcome this challenge.

Thank you for your attention to this matter.

Best regards

@garyrussell
Copy link
Contributor

@alograg I am not sure I can add any value to such a discussion, but feel free to use the Discussions tab above.

@alograg
Copy link

alograg commented Jul 18, 2023

Dicsution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants