Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dangling tasks fix 2 #386

Merged
merged 2 commits into from
Dec 13, 2024
Merged

Conversation

AlexPorebski
Copy link
Contributor

@AlexPorebski AlexPorebski commented Nov 19, 2024

Hey @LGouellec
This PR is the follow up to my PR from month ago: #379

Short TLDR of problems that we encountered in our project:
Our application is running in a docker container and during its life we create and close a lot of instances of KafkaStream. After our QA team started running tests on it, we saw that container with service responsible for stream processing is closed by OS with message that it reached maximum number of tasks allowed by config. After some debugging I found out that most of the task are pointing to Kafka connections
Example listing of tasks under container with running stream:

  PID  SPID TTY          TIME CMD
20271 20271 ?        00:00:01 dotnet
20271 20282 ?        00:00:00 .NET SynchManag
20271 20297 ?        00:00:00 .NET EventPipe
20271 20298 ?        00:00:00 .NET DebugPipe
20271 20299 ?        00:00:00 .NET Debugger
20271 20300 ?        00:00:00 .NET Finalizer
20271 20324 ?        00:00:13 .NET Timer
20271 20326 ?        00:00:01 .NET TP Gate
20271 20330 ?        00:00:00 .NET SigHandler
20271 20332 ?        00:00:00 .NET Sockets
20271 20333 ?        00:00:00 .NET Long Runni
20271 20335 ?        00:00:00 .NET Long Runni
20271 20339 ?        00:00:00 .NET Long Runni
20271 20343 ?        00:00:00 .NET File Watch
20271 20344 ?        00:00:00 DefaultSocketMa
20271 20345 ?        00:00:00 DefaultSocketMa
20271 20346 ?        00:00:00 DefaultSocketMa
20271 20347 ?        00:00:00 DefaultSocketMa
20271 20348 ?        00:00:00 DefaultSocketMa
20271 20349 ?        00:00:00 DefaultSocketMa
20271 20350 ?        00:00:00 DefaultSocketMa
20271 20351 ?        00:00:00 DefaultSocketMa
20271 20352 ?        00:00:00 DefaultSocketMa
20271 20353 ?        00:00:00 DefaultSocketMa
20271 20357 ?        00:00:00 .NET Long Runni
20271 20413 ?        00:00:00 rdk:main
20271 20414 ?        00:00:00 rdk:broker-1
20271 20415 ?        00:00:00 rdk:broker1
20271 20416 ?        00:00:03 .NET Long Runni
20271 20417 ?        00:00:03 .NET Long Runni
20271 20418 ?        00:00:00 Console logger
20271 10863 ?        00:00:00 .NET TP Worker
20271 11100 ?        00:00:00 rdk:main
20271 11101 ?        00:00:00 rdk:broker-1
20271 11102 ?        00:00:00 rdk:broker1
20271 11103 ?        00:00:00 .NET Long Runni
20271 11163 ?        00:00:00 .NET TP Worker
20271 11217 ?        00:00:00 .NET TP Worker
20271 11443 ?        00:00:00 .NET TP Worker
20271 11600 ?        00:00:00 .NET TP Worker
20271 11602 ?        00:00:00 .NET Tiered Com
20271 11603 ?        00:00:00 rdk:main
20271 11604 ?        00:00:00 rdk:broker-1
20271 11605 ?        00:00:00 rdk:broker1
20271 11606 ?        00:00:00 .NET Long Runni
20271 11607 ?        00:00:00 .NET Long Runni
20271 11608 ?        00:00:00 rdk:main
20271 11609 ?        00:00:00 rdk:broker-1
20271 11610 ?        00:00:00 rdk:broker1
20271 11611 ?        00:00:00 .NET Long Runni
20271 11612 ?        00:00:00 rdk:broker-1
20271 11613 ?        00:00:00 rdk:main
20271 11615 ?        00:00:00 rdk:broker-1
20271 11616 ?        00:00:00 rdk:broker1
20271 11617 ?        00:00:00 rdk:broker-1
20271 11618 ?        00:00:00 rdk:main
20271 11619 ?        00:00:00 rdk:broker-1
20271 11620 ?        00:00:00 rdk:broker1
20271 11621 ?        00:00:00 rdk:main
20271 11622 ?        00:00:00 rdk:broker-1
20271 11623 ?        00:00:00 rdk:broker1
20271 11624 ?        00:00:00 .NET Long Runni
20271 11625 ?        00:00:00 .NET Long Runni
20271 11626 ?        00:00:00 rdk:main
20271 11627 ?        00:00:00 rdk:broker-1
20271 11628 ?        00:00:00 rdk:broker1
20271 11629 ?        00:00:00 .NET Long Runni

From what I was able to find out tasks with prefix rdk: are created by lib handling connecting to Kafka. And mostly those tasks were left after complete shutdown of stream processing. My changes in PR from October (calling close on admin client) helped in reducing number of dangling task significantly, but there were still some leftovers.

This PR includes two changes:

  • call Clear() on StreamThread.changelogReade during CompleteShutdown(): I saw that it was used only in unit test and calling it during shutdown of thread removed all other leftover connections
  • allow to pass ILoggerFactory to StreamConfig constructor: when creating new StreamConfig for stream new LoggerFactory was created each time even thou we pass our instance to it, each closed stream left one task called Console logger, to combat this I propose allowing to pass ILoggerFactory directly to StreamConfig() and creating separate factory only when it is not passed to constructor

With this two changes I was able to achieve state when number of tasks is the same in freshly started container and one that was running for some time with all streams closed. So, no more dangling tasks, it looks like everything is disposed now :)

@AlexPorebski
Copy link
Contributor Author

@LGouellec could you check this PR? We hope that this fix 00 be added to v.1.7

@LGouellec
Copy link
Owner

@AlexPorebski Please review my comments

@LGouellec LGouellec merged commit d0f0c2c into LGouellec:develop Dec 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants