Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix leaking FDs in the run (_call) function #880

Merged
merged 2 commits into from
Aug 12, 2024
Merged

Conversation

t184256
Copy link
Contributor

@t184256 t184256 commented Aug 12, 2024

updated by @pirat89

Original implementation of the _call function (used by run from actors) has been leaking file descriptors per each invocation. So in case an actor executed too many of them during an execution it could reach the limit of allowed opened FDs and the execution failed with OSError: Too many opened files error.

There have been actually 2 issues:

  • EventLoop (alias select.epoll) have been called before the fork so the child process inherit the created FD too
    • that has not been so big problem actually as the child process always ended before the parent finished the function so this has been closed always safely by OS anyway. But updating it anyway to make it cleaner.
  • stdout and stderr pipes (to consumes content from child process) have been created before the fork but have not been closed in the parent
    • The parent process hasn't closed related FDs so this led to the situation where 2 FDs leaked per the code execution as these have been opened until the end of the actor execution. Closing these after the EventLoop is closed (and death of the child process) resolved this issue.

Warning: totally untested (@pirat89 manually tested on rhel7 & rhel9)

Copy link

Thank you for contributing to the Leapp project!

Please note that every PR needs to comply with the Leapp Guidelines and must pass all tests in order to be mergeable.
If you want to request a review or rebuild a package in copr, you can use following commands as a comment:

  • review please @oamg/developers to notify leapp developers of the review request
  • /packit copr-build to submit a public copr build using packit

To launch regression testing public members of oamg organization can leave the following comment:

  • /rerun to schedule basic regression tests using this pr build and leapp-repository*master* as artifacts
  • /rerun 42 to schedule basic regression tests using this pr build and leapp-repository*PR42* as artifacts
  • /rerun-sst to schedule sst tests using this pr build and leapp-repository*master* as artifacts
  • /rerun-sst 42 to schedule sst tests using this pr build and leapp-repository*PR42* as artifacts

Please open ticket in case you experience technical problem with the CI. (RH internal only)

Note: In case there are problems with tests not being triggered automatically on new PR/commit or pending for a long time, please consider rerunning the CI by commenting leapp-ci build (might require several comments). If the problem persists, contact leapp-infra.

@pirat89
Copy link
Member

pirat89 commented Aug 12, 2024

/packit copr-build

@pirat89
Copy link
Member

pirat89 commented Aug 12, 2024

/packit copr-build

@pirat89
Copy link
Member

pirat89 commented Aug 12, 2024

tested and it's still broken with the same error for IPU 9 -> 10. Let's see whether we will be able to fix it before the release, otherwise let's keep it for the next release. Currently we have just ~4d for changes in this release and the issue is discovered only for IPU 9 -> 10.

@pirat89
Copy link
Member

pirat89 commented Aug 12, 2024

/packit copr-build

Copy link
Member

@pirat89 pirat89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems it works as expected now. tested manually (rhel7 & rhel9), the stdout/stderr FDs have been really leaking too. Let's wait for the test results yet.

@pirat89 pirat89 added this to the 8.10/9.6 milestone Aug 12, 2024
@pirat89 pirat89 added the changelog-checked The merger/reviewer checked the changelog draft document and updated it when relevant label Aug 12, 2024
@pirat89 pirat89 changed the title Fix leaking epoll FDs Fix leaking FDs in the run (_call) function Aug 12, 2024
Copy link
Member

@pirat89 pirat89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm and works as expected. double checked with @abadger that it should be safe enough even so close to the release. merging

@pirat89 pirat89 modified the milestones: 8.10/9.6, 8.10/9.5 Aug 12, 2024
@pirat89 pirat89 merged commit 5248cb1 into oamg:master Aug 12, 2024
19 of 21 checks passed
pirat89 pushed a commit to oamg/leapp-repository that referenced this pull request Aug 13, 2024
Reverting commit  60f500e

The original commit only workarounded the root cause - leaked file descriptors in the leapp stdlib when using the `run` function. Dropping the change in the actor as it is not needed anymore.

relates: oamg/leapp#880
pirat89 added a commit to pirat89/leapp that referenced this pull request Aug 16, 2024
## Packaging
- Start building for EL 9 in the upstream repository on COPR (oamg#855)

## Framework
### Enhancements
- Minor update in the summary overview to highlight what is present in the pre-upgrade report (oamg#858)
- Store metadata about actors, workflows, and dialogs inside leapp audit db (oamg#847, oamg#867)

## Leapp (tool)
### Enhancements
- Implement singleton leapp execution to prevent multiple running leapp instances on the system in the same time (oamg#851)

## stdlib
### Fixes
- Close properly all file descriptors when executing shell commands via `run` (oamg#880)

## Modifications
- Code is now Python3.12 compatible (oamg#855)
@pirat89 pirat89 mentioned this pull request Aug 16, 2024
pirat89 added a commit that referenced this pull request Aug 16, 2024
## Packaging
- Start building for EL 9 in the upstream repository on COPR (#855)

## Framework
### Enhancements
- Minor update in the summary overview to highlight what is present in the pre-upgrade report (#858)
- Store metadata about actors, workflows, and dialogs inside leapp audit db (#847, #867)

## Leapp (tool)
### Enhancements
- Implement singleton leapp execution to prevent multiple running leapp instances on the system in the same time (#851)

## stdlib
### Fixes
- Close properly all file descriptors when executing shell commands via `run` (#880)

## Modifications
- Code is now Python3.12 compatible (#855)
yuravk pushed a commit to yuravk/leapp-repository that referenced this pull request Aug 19, 2024
Reverting commit  60f500e

The original commit only workarounded the root cause - leaked file descriptors in the leapp stdlib when using the `run` function. Dropping the change in the actor as it is not needed anymore.

relates: oamg/leapp#880
(cherry picked from commit 24700ee)
yuravk pushed a commit to yuravk/leapp-repository that referenced this pull request Aug 20, 2024
Reverting commit  60f500e

The original commit only workarounded the root cause - leaked file descriptors in the leapp stdlib when using the `run` function. Dropping the change in the actor as it is not needed anymore.

relates: oamg/leapp#880
(cherry picked from commit 24700ee)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug changelog-checked The merger/reviewer checked the changelog draft document and updated it when relevant
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants