-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tight loop while trying too manage consistencyTest topic #182
Comments
The test run eventually finished with:
Which suggests the test containers integration was failing but we aren't handling that effectively |
I also see
Which lends further wait to problems with the cluster not being properly propagated. |
I'll take a look. Anything interesting in the container logs (./junit5-extension/target/container-logs)? |
Can't reproduce it yet. Could this have been related to a slow pull of the new image in your environment? |
No I didn't get the container logs. It not deterministic for me but I have seen it a few times. |
Are you running under podman? Are you applying? |
I finally got a failure. If took several hours to appear. 2023-09-21 13:53:30 WARN ForkJoinPool.commonPool-worker-3 io.kroxylicious.testing.kafka.common.Utils:214 - Failed to create topic: __org_kroxylicious_testing_consistencyTest due to org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics TimeoutException is a RetriableException so that might explain the loop. I wonder if we are leaking an admin client, or the loop within createTopics is continuing even though the client is closed? |
Yeah, the admin client produces an exception with this message after the client is closed. It seems bizarre for it to use a RetriableException. The client isn't going to come back to life. This seems like a very odd choice for the Kafka Client (@showuon WDYT?). I think the looping is probably a secondary cause. I guess we are still looking for a root cause your initial failure. from: org.apache.kafka.clients.admin.KafkaAdminClient.AdminClientRunnable#call void call(Call call, long now) {
if (hardShutdownTimeMs.get() != INVALID_SHUTDOWN_TIME) {
log.debug("The AdminClient is not accepting new calls. Timing out {}.", call);
call.handleTimeoutFailure(time.milliseconds(),
new TimeoutException("The AdminClient thread is not accepting new calls."));
} else {
enqueue(call, now);
}
} |
@SamBarker I only saw one test failure that looks like yours today, despite running in a loop for most of the day. I did keep seeing #183, but haven't investigated that yet. I curious what you see with my PR. I suspect that in your case the topic creation loop will be a secondary issue, and there will be a root cause failure that is still to be understood/dealt with. |
The topic creation loop is definitely a symptom yes.
Yes, podman and apparently no it had been reverted :( So that is probably the root cause. |
Ok, so I think the createTopic loop still deserves to be fixed. |
Nice find @k-wall ! Yes, it's definitely a bug in adminClient. I've filed KAFKA-15507 and see if anyone is interested in picking it up. Otherwise, I'll fix that later when available. |
… admin client gets closed workaround for KAFKA-15507
… admin client gets closed workaround for KAFKA-15507
…t gets closed (#184) workaround for KAFKA-15507
Please use this to only for bug reports. For questions or when you need help, you can use the GitHub Discussions or use the community Slack chat.
Describe the bug
The test suite sometimes gets itself into a state where it logs the following ad nauseam.
The following stack trace is also logged regularly and is probably more indicative of the underlying cause.
To Reproduce
Steps to reproduce the behavior:
/shrug
Expected behavior
A clear and concise description of what you expected to happen.
Logs
Attach or copy and paste relevant logs.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: