Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix containerd port exit monitoring #9580

Closed
wants to merge 1 commit into from

Conversation

ZP-AlwaysWin
Copy link

After starting the NRI plugin, a restart of the container leads to the termination of the listening port.
#8860

config.toml

[plugins."io.containerd.nri.v1.nri"]
    # Enable NRI support in containerd.
    disable = false
    # Allow connections from externally launched NRI plugins.
    disable_connections = false
    # plugin_config_path is the directory to search for plugin-specific configuration.
    plugin_config_path = "/etc/nri/conf.d"
    # plugin_path is the directory to search for plugins to launch on startup.
    plugin_path = "/opt/nri/plugins"
    # plugin_registration_timeout is the timeout for a plugin to register after connection.
    plugin_registration_timeout = "5s"
    # plugin_requst_timeout is the timeout for a plugin to handle an event/request.
    plugin_request_timeout = "300s"
    # socket_path is the path of the NRI socket to create for plugins to connect to.
    socket_path = "/var/run/nri/nri.sock"

After enabling the NRI plugin and repeatedly restarting containerd, the issue occurs where the contained process exists but the port exit monitoring fails, resulting in the unavailability of exec and other commands. You can use the following script to reproduce this problem.

output_file="containerd_ports.txt"
while true
do
    systemctl restart containerd
    sleep 5s
    containerd_ports=$(netstat -tulnp | grep "containerd" | awk '{print $4}' | awk -F':' '{print $NF}')
    if [ -n "$containerd_ports" ]; then
        echo "containerd listen in port:$containerd_ports" >> "$output_file"
    else
        echo "containerd don't listen in port,exit" >> "$output_file"
        exit 0
    fi
done

The root cause of the issue is believed to be the enabling of the NRI plugin, which involves multiple calls to the net library, thereby affecting the port monitoring. To address this, the solution involves moving the enabling of the NRI plugin before the port monitoring, thus avoiding any interference caused by the net library calls within the NRI. This fix will not impact the existing logic.

…e termination of the listening port.

Signed-off-by: ZP-AlwaysWin <[email protected]>
@k8s-ci-robot
Copy link

Hi @ZP-AlwaysWin. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@champtar
Copy link
Contributor

champtar commented Jan 26, 2024

Facing the same issue, ie containerd not listening on stream port when enabling NRI, but only on server boot it seems, containerd restart and it works

@champtar
Copy link
Contributor

Possible fix in NRI: containerd/nri#66

@mikebrow
Copy link
Member

nri is tied to the cri plugin I believe.. would have to look deeper but have to ask.. did you try this with a default containerd config? e.g. containerd config default > /etc/containerd/config.toml

@champtar
Copy link
Contributor

@mikebrow go see the NRI PR, it explains the root cause

@klihub
Copy link
Member

klihub commented Jan 31, 2024

@ZP-AlwaysWin This does not look to me like a real fix for the problem. I think it just hides the bad side-effects. @champtar seems to have a plausible explanation for the root cause and is working on a fix. Let's wait for that and then verify that it gets rid of the experienced misbehavior you reported here.

@klihub
Copy link
Member

klihub commented Jan 31, 2024

@ZP-AlwaysWin One more thing, just to make it sure. You do have at least one containerd-launched plugin in /opt/nri/plugins, right ?

@k8s-ci-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ZP-AlwaysWin
Copy link
Author

@ZP-AlwaysWin One more thing, just to make it sure. You do have at least one containerd-launched plugin in /opt/nri/plugins, right ?

Yes, I have it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants