Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658

Jakuje · 2024-11-14T14:23:18Z

SUMMARY

The function exec_command() keeps the callbacks as a local variable before assigning them to the created channel. The channel is not guaranteed to be completely freed when ssh_channel_free() is called because there might be some leftover messages or responses to process (close confirmation, exit code ...).

Calling the exec_command() as done previously in the test from the same function without anything in between (except assert) will likely map the second function call to the same memory on the call stack so it was working most of the time. But calling it from different functions or contexts will likely change the call stack and processing of outstanding callbacks is more likely to result in addressing wrong memory location.

Likely fixes #57, #645 and #657

ISSUE TYPE

Bugfix Pull Request

ADDITIONAL INFORMATION

I was not able to reproduce the issue locally so pushing to see if the CI will be able to crash.

This is also introducing memory leaks as the callback structure is never freed. We should probably store it somewhere in the python code before returning to make sure it is not garbage collected (or can the python GC track the callback pointer is still stored on the libssh side?).

packit-as-a-service · 2024-11-14T14:48:19Z

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

sudo yum install -y dnf-plugins-core on RHEL 8
sudo dnf install -y dnf-plugins-core on Fedora
dnf copr enable packit/ansible-pylibssh-658
And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

tests/unit/channel_test.py

docs/changelog-fragments/658.bugfix.rst

tests/unit/channel_test.py

src/pylibsshext/channel.pyx

tests/unit/channel_test.py

src/pylibsshext/channel.pyx

webknjaz · 2024-11-19T16:31:25Z

@Jakuje it looks like this is making some CI jobs get stuck: https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210235668?pr=658 / https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210009382?pr=658.
(I cancelled the 3.9 one after 39 minutes, and the 3.10 one was killed by xdist after a 30-second timeout)

webknjaz · 2024-11-19T16:32:32Z

I restarted said jobs, but this is something to look into, as it'll probably make the CI flakier if merged.

webknjaz · 2024-11-19T16:34:29Z

@Jakuje the rawhide test failure log is similar to those in GHA: https://download.copr.fedorainfracloud.org/results/packit/ansible-pylibssh-658/fedora-rawhide-x86_64/08279621-python-ansible-pylibssh/builder-live.log.gz.

webknjaz · 2024-11-19T17:08:17Z

OTOH, it's also unstable on devel: https://github.com/ansible/pylibssh/actions/runs/11917417576/job/33212843276

webknjaz · 2024-11-20T14:46:53Z

So the failure @ https://github.com/ansible/pylibssh/actions/runs/11935006968/job/33265619653?pr=658#step:15:181 is referring to https://github.com/ansible/pylibssh/blob/6cbf2dd/src/pylibsshext/channel.pyx#L45.
And for some reason, the result variable in that context seems to contain an instance of a pytest-internal object — ReprTraceback (https://github.com/pytest-dev/pytest/blob/72f17d1/src/_pytest/_code/code.py#L1156-L1180).

This is rather bizzare. How did it get in there?

============================= test session starts ==============================
platform linux -- Python 3.8.18, pytest-8.3.3, pluggy-1.5.0
cachedir: .tox/test-source-dists/.pytest_cache
rootdir: /home/runner/work/pylibssh/pylibssh
configfile: pytest.ini
testpaths: tests/
plugins: cov-5.0.0, forked-1.6.0, xdist-3.6.1
created: 4/4 workers
4 workers [25 items]

Timeout (0:00:30)!
Thread 0x00007fd72445b700 (most recent call first):
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 534 in read
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 567 in from_io
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1160 in _thread_receiver
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 341 in run
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn

Thread 0x00007fd7255ff740 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 1780 in _try_wait
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 1822 in _wait
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 1083 in wait
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 342 in call
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 359 in check_call
  File "/home/runner/work/pylibssh/pylibssh/tests/conftest.py", line 103 in ssh_clientkey_path
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 898 in call_fixture_func
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1140 in pytest_fixture_setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1091 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 532 in getfixturevalue
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 697 in _fillfixtures
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/python.py", line 1630 in setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 514 in setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 160 in pytest_runtest_setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/xdist/remote.py", line 195 in run_one_test
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/xdist/remote.py", line 174 in pytest_runtestloop
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/main.py", line 337 in _main
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/xdist/remote.py", line 393 in <module>
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1291 in executetask
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 341 in run
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 389 in integrate_as_primary_thread
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1273 in serve
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1806 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
..FE......................                                               [100%]
==================================== ERRORS ====================================
____________________ ERROR at teardown of test_exec_command ____________________
[gw0] linux -- Python 3.8.18 /home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/bin/python

>   result.stdout += data_b
E   AttributeError: 'ReprTraceback' object has no attribute 'stdout'

BytesIO    = <class '_io.BytesIO'>
Channel    = <class 'pylibsshext.channel.Channel'>
ChannelCallback = <class 'pylibsshext.channel.ChannelCallback'>
CompletedProcess = <class 'subprocess.CompletedProcess'>
LibsshChannelReadFailure = <class 'pylibsshext.errors.LibsshChannelReadFailure'>
__builtins__ = <builtins>
__doc__    = None
__file__   = '/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pylibsshext/channel.cpython-38-x86_64-linux-gnu.so'
__loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f1714238b50>
__name__   = 'pylibsshext.channel'
__package__ = 'pylibsshext'
__reduce_cython__ = <cyfunction Channel.__reduce_cython__ at 0x7f17141352b0>
__setstate_cython__ = <cyfunction Channel.__setstate_cython__ at 0x7f1714133450>
__spec__   = ModuleSpec(name='pylibsshext.channel', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f1714238b50...libssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pylibsshext/channel.cpython-38-x86_64-linux-gnu.so')
__test__   = {}
signal     = <module 'signal' from '/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/signal.py'>
time       = <module 'time' (built-in)>

src/pylibsshext/channel.pyx:45: AttributeError

The above exception was the direct cause of the following exception:

ssh_client_session = <pylibsshext.session.Session object at 0x7f170fbccca0>

    @pytest.fixture
    def ssh_channel(ssh_client_session):
        """Initialize a channel and tear it down after testing.
    
        :param ssh_client_session: A pre-authenticated SSH session.
        :type ssh_client_session: pylibsshext.session.Session
    
        :yields: A libssh channel instance.
        :ytype: pylibsshext.channel.Channel
        """
        chan = ssh_client_session.new_channel()
        try:  # noqa: WPS501
            yield chan
        finally:
>           chan.close()
E           SystemError: <cyfunction Channel.close at 0x7f17141351e0> returned a result with an error set

chan       = <pylibsshext.channel.Channel object at 0x7f1714180fc0>
ssh_client_session = <pylibsshext.session.Session object at 0x7f170fbccca0>

tests/unit/channel_test.py:33: SystemError
---------------------------- Captured stdout setup -----------------------------
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key.pub
The key fingerprint is:
SHA256:Us8tQjYuzYhTfcblu4lb5wrgGe2ldOtFQ++NgSVUrug ansible-pylibssh integration tests key
The key's randomart image is:
+---[RSA 8192]----+
|            o..  |
|       . . + .   |
|      . * + o +  |
|     o O B o B . |
|    o + S B B + .|
|     . + O * = =.|
|        o E = + o|
|           = +   |
|          . o..  |
+----[SHA256]-----+
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key.pub
The key fingerprint is:
SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw runner@fv-az524-896
The key's randomart image is:
+---[RSA 3072]----+
|==o..            |
|+.+. o   .       |
| +  +   o +      |
|  oo ..+ o =     |
| .  o +oS *     E|
|  . oo.o.= +   + |
|   o .ooo.=..   +|
|    . =oo.o  . . |
|    .+.=..    .  |
+----[SHA256]-----+
---------------------------- Captured stderr setup -----------------------------
debug2: load_server_config: filename /dev/null
debug2: load_server_config: done config len = 1
debug2: parse_server_config_depth: config /dev/null len 1
debug1: sshd version OpenSSH_8.2, OpenSSL 1.1.1f  31 Mar 2020
debug1: private host key #0: ssh-rsa SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw
debug1: setgroups() failed: Operation not permitted
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: rexec_argv[2]='-f'
debug1: rexec_argv[3]='/dev/null'
debug1: rexec_argv[4]='-o'
debug1: rexec_argv[5]='LogLevel=DEBUG3'
debug1: rexec_argv[6]='-o'
debug1: rexec_argv[7]='HostKey=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key'
debug1: rexec_argv[8]='-o'
debug1: rexec_argv[9]='PidFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/sshd.pid'
debug1: rexec_argv[10]='-o'
debug1: rexec_argv[11]='UsePAM=yes'
debug1: rexec_argv[12]='-o'
debug1: rexec_argv[13]='PasswordAuthentication=no'
debug1: rexec_argv[14]='-o'
debug1: rexec_argv[15]='ChallengeResponseAuthentication=no'
debug1: rexec_argv[16]='-o'
debug1: rexec_argv[17]='GSSAPIAuthentication=no'
debug1: rexec_argv[18]='-o'
debug1: rexec_argv[19]='StrictModes=no'
debug1: rexec_argv[20]='-o'
debug1: rexec_argv[21]='PermitEmptyPasswords=yes'
debug1: rexec_argv[22]='-o'
debug1: rexec_argv[23]='PermitRootLogin=yes'
debug1: rexec_argv[24]='-o'
debug1: rexec_argv[25]='Protocol=2'
debug1: rexec_argv[26]='-o'
debug1: rexec_argv[27]='HostbasedAuthentication=no'
debug1: rexec_argv[28]='-o'
debug1: rexec_argv[29]='IgnoreUserKnownHosts=yes'
debug1: rexec_argv[30]='-o'
debug1: rexec_argv[31]='Port=43559'
debug1: rexec_argv[32]='-o'
debug1: rexec_argv[33]='ListenAddress=127.0.0.1'
debug1: rexec_argv[34]='-o'
debug1: rexec_argv[35]='AuthorizedKeysFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/authorized_keys'
debug1: rexec_argv[36]='-o'
debug1: rexec_argv[37]='AcceptEnv=LANG LC_*'
debug1: rexec_argv[38]='-o'
debug1: rexec_argv[39]='Subsystem=sftp internal-sftp'
Warning: Permanently added '[127.0.0.1]:43559' (RSA) to the list of known hosts.
=================================== FAILURES ===================================
______________________________ test_exec_command _______________________________
[gw0] linux -- Python 3.8.18 /home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/bin/python

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>

    def test_exec_command(ssh_channel):
        """Test getting the output of a remotely executed command."""
        u_cmd_out = ssh_channel.exec_command('echo -n Hello World').stdout.decode()
        assert u_cmd_out == u'Hello World'  # noqa: WPS302
        # Test that repeated calls to exec_command do not segfault.
    
        # NOTE: Call `exec_command()` once again from another function to
        # NOTE: force it to happen in another place of the call stack,
        # NOTE: making sure that the context is different from one in this
        # NOTE: this test function. The resulting call stack will end up
        # NOTE: being more random.
>       exec_second_command(ssh_channel)

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>
u_cmd_out  = 'Hello World'

tests/unit/channel_test.py:53: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>

    def exec_second_command(ssh_channel):
        """Call ``exec_command()`` and checks the output."""
        u_cmd_out = ssh_channel.exec_command('echo -n Hello Again').stdout.decode()
>       assert u_cmd_out == u'Hello Again'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello Again'
E         
E         - Hello Again

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>
u_cmd_out  = ''

tests/unit/channel_test.py:39: AssertionError
---------------------------- Captured stdout setup -----------------------------
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key.pub
The key fingerprint is:
SHA256:Us8tQjYuzYhTfcblu4lb5wrgGe2ldOtFQ++NgSVUrug ansible-pylibssh integration tests key
The key's randomart image is:
+---[RSA 8192]----+
|            o..  |
|       . . + .   |
|      . * + o +  |
|     o O B o B . |
|    o + S B B + .|
|     . + O * = =.|
|        o E = + o|
|           = +   |
|          . o..  |
+----[SHA256]-----+
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key.pub
The key fingerprint is:
SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw runner@fv-az524-896
The key's randomart image is:
+---[RSA 3072]----+
|==o..            |
|+.+. o   .       |
| +  +   o +      |
|  oo ..+ o =     |
| .  o +oS *     E|
|  . oo.o.= +   + |
|   o .ooo.=..   +|
|    . =oo.o  . . |
|    .+.=..    .  |
+----[SHA256]-----+
---------------------------- Captured stderr setup -----------------------------
debug2: load_server_config: filename /dev/null
debug2: load_server_config: done config len = 1
debug2: parse_server_config_depth: config /dev/null len 1
debug1: sshd version OpenSSH_8.2, OpenSSL 1.1.1f  31 Mar 2020
debug1: private host key #0: ssh-rsa SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw
debug1: setgroups() failed: Operation not permitted
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: rexec_argv[2]='-f'
debug1: rexec_argv[3]='/dev/null'
debug1: rexec_argv[4]='-o'
debug1: rexec_argv[5]='LogLevel=DEBUG3'
debug1: rexec_argv[6]='-o'
debug1: rexec_argv[7]='HostKey=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key'
debug1: rexec_argv[8]='-o'
debug1: rexec_argv[9]='PidFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/sshd.pid'
debug1: rexec_argv[10]='-o'
debug1: rexec_argv[11]='UsePAM=yes'
debug1: rexec_argv[12]='-o'
debug1: rexec_argv[13]='PasswordAuthentication=no'
debug1: rexec_argv[14]='-o'
debug1: rexec_argv[15]='ChallengeResponseAuthentication=no'
debug1: rexec_argv[16]='-o'
debug1: rexec_argv[17]='GSSAPIAuthentication=no'
debug1: rexec_argv[18]='-o'
debug1: rexec_argv[19]='StrictModes=no'
debug1: rexec_argv[20]='-o'
debug1: rexec_argv[21]='PermitEmptyPasswords=yes'
debug1: rexec_argv[22]='-o'
debug1: rexec_argv[23]='PermitRootLogin=yes'
debug1: rexec_argv[24]='-o'
debug1: rexec_argv[25]='Protocol=2'
debug1: rexec_argv[26]='-o'
debug1: rexec_argv[27]='HostbasedAuthentication=no'
debug1: rexec_argv[28]='-o'
debug1: rexec_argv[29]='IgnoreUserKnownHosts=yes'
debug1: rexec_argv[30]='-o'
debug1: rexec_argv[31]='Port=43559'
debug1: rexec_argv[32]='-o'
debug1: rexec_argv[33]='ListenAddress=127.0.0.1'
debug1: rexec_argv[34]='-o'
debug1: rexec_argv[35]='AuthorizedKeysFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/authorized_keys'
debug1: rexec_argv[36]='-o'
debug1: rexec_argv[37]='AcceptEnv=LANG LC_*'
debug1: rexec_argv[38]='-o'
debug1: rexec_argv[39]='Subsystem=sftp internal-sftp'
Warning: Permanently added '[127.0.0.1]:43559' (RSA) to the list of known hosts.
- generated xml file: /home/runner/work/pylibssh/pylibssh/.test-results/pytest/results.xml -
============================= slowest 10 durations =============================
33.40s setup    tests/unit/scp_test.py::test_get
29.28s setup    tests/unit/scp_test.py::test_get_large
22.48s setup    tests/unit/channel_test.py::test_recv_eof
22.26s setup    tests/unit/scp_test.py::test_get_existing_local
16.19s setup    tests/unit/channel_test.py::test_channel_exit_status
12.88s setup    tests/unit/channel_test.py::test_send_eof
12.46s setup    tests/unit/scp_test.py::test_copy_from_non_existent_remote_path
12.35s setup    tests/unit/channel_test.py::test_request_exec
11.33s setup    tests/unit/channel_test.py::test_send_signal
11.08s setup    tests/unit/channel_test.py::test_is_eof
=========================== short test summary info ============================
ERROR tests/unit/channel_test.py::test_exec_command - SystemError: <cyfunctio...
FAILED tests/unit/channel_test.py::test_exec_command - AssertionError: assert...
=============== 1 failed, 24 passed, 1 error in 67.99s (0:01:07) ===============

tests/unit/channel_test.py

Jakuje · 2024-11-25T14:54:49Z

I have the feeling there is some issue in how the references are counted or some timing/synchronization issue. The current code works for me locally as well as in manylinux, but fails in rpm build of Fedora 40+ fails already with the first command:

>       assert u_cmd.stdout.decode() == u'Hello World'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello World'
E         - Hello World

but on Fedora 39 fails on the second one:

>       assert u_cmd.stdout.decode() == u'Hello Again'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello Again'
E         - Hello Again

Jakuje · 2024-12-22T21:24:09Z

Seems like I can see the issue now. The problem is that the Chanel::exec_command() sends the command, but does not wait for the server reply and basically immediately sends the EOF. The call to the ssh_channel_get_exit_status() add some synchronization, meaning that it tries to read the outstanding packets, but does not guarantee that all of the data is processed. In the last run, I can see that the messages from the OpenSSH server are received in this order:

(first exit status)
[2024/12/22 18:51:16.267580, 3] channel_rcv_request:  received exit-status 0
(second data we look for)
[2024/12/22 18:51:16.267604, 3] channel_default_bufferize:  placing 11 bytes into channel buffer (stdout)
(eof, close)
[2024/12/22 18:51:16.267656, 3] channel_rcv_eof:  Received eof on channel (44:1)
[...]
[2024/12/22 18:51:16.267670, 3] channel_rcv_close:  Received close on channel (44:1)

While naively, I would expect the first two messages should come in reversed direction. But I do not see this mandated anywhere in the specs. The openssh sends this message basically async, when the child process dies:

https://github.com/openssh/openssh-portable/blob/826483d51a9fee60703298bbf839d9ce37943474/session.c#L2369

My suspicion is that this is a timing/scheduling issue and when openssh sends the messages in this order, we will not process both output and exit code and close the channel earlier than we could read the output.

Pushed a change that should address this issue, I hope.

Jakuje · 2024-12-23T08:10:37Z

This looks like it did not help. Reading through the code and through the logs, another data point might be that the ssh_channel_request_exec might already invoke some callbacks and if they are not registered at that time, we might lose some data. Trying with setting the callback earlier.

Jakuje · 2024-12-23T09:37:00Z

This sounds like it finally made it through. Now, there is just the broken ubi, which I likely broke with the removal of the pytest-forked.

Jakuje · 2024-12-23T10:06:06Z

The error in ubi looked like this:

+ /usr/bin/python3 -m build --wheel --skip-dependencies --no-isolation .
[1/1] Cythonizing /tmp/.tmp-ansible-pylibssh-pep517-84wt4gfm/src/src/pylibsshext/errors.pyx
/github/home/rpmbuild/BUILD/ansible-pylibssh-1.2.3.dev87+ga7eb975/bin/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning: 
ERROR: setuptools==56.0.0 is used in combination with setuptools_scm>=8.x

Your build configuration is incomplete and previously worked by accident!
setuptools_scm requires setuptools>=61

Suggested workaround if applicable:
 - migrating from the deprecated setup_requires mechanism to pep517/518
   and using a pyproject.toml to declare build dependencies
   which are reliably pre-installed before running the build tools

(not sure if this is the relevant part though)

It worked last month, but I do not see any logs from the ubi runs. lets try with the pytest-forked (with some cleanup of the commit series)

Jakuje · 2024-12-23T10:18:22Z

now, rpmfind is down causing the ubi build failure ... probably giving up for today, but it should be in the shape for a reviews

The Protocol=2 is default and the current implementation does not even support any other protocol version. Using this just makes it confusing to readers Signed-off-by: Jakub Jelen <[email protected]>

When the OpenSSH server runs in daemon mode, the logs are written into syslog. This is not suitable for tests as we can not see what is going on there. The captured logs are only from the main sshd process and does not show any useful information about the connection and what we actually test. The trick to get debug logs is either to not run in deamon mode, but it would accept only one connection, or use the `-E` to write logs from all connections to the given file. Signed-off-by: Jakub Jelen <[email protected]>

Even though the PAM is required for the server to correctly work and integrate well to the OS, for testing, PAM introduces needless complexity and it is much better to not to use it. The tests are running under non-root user so the OpenSSH won't use PAM anyway. Signed-off-by: Jakub Jelen <[email protected]>

This simplifies debugging of the failed tests. With previous changes, we have already debug logs from the server. This prints also debug logs from the client and allows us to align sent and received protocol messages. Signed-off-by: Jakub Jelen <[email protected]>

Previously, stack-local variables held the callback structure. This worked as long as the call stack was not overridden by other function calls or as long as the server closed the channel very early. In other cases, the delayed close in libssh could result in calling the callbacks even after we believed the channel was freed, causing invalid memory access and crashes. The accompanying change to libssh was merged to avoid calling callbacks on channels the caller considers freed here: https://gitlab.com/libssh/libssh-mirror/-/merge_requests/549/ But we will be using older libssh versions for some time so we need a workaround in pylibssh too. Fixes ansible#57 Signed-off-by: Jakub Jelen <[email protected]>

Surprisingly, the SSH protocol does not define if the exit-code notification can come before or after the data. Depending on the server, scheduler or current mood, the exit-code might arrive earlier than the data and if we close the channel right away, we will lose some output. This changes the code to correctly wait for the EOF before closing the channel and returning. Signed-off-by: Jakub Jelen <[email protected]>

Depending on scheduler, the ssh_channel_request_exec() might already process some of the data from the peer and we could lose some data from peer. By setting the callback before calaling this function, we make sure callbacks are correctly triggered on all the data. Signed-off-by: Jakub Jelen <[email protected]>

Removes the failed marker as it works now. Signed-off-by: Jakub Jelen <[email protected]>

Signed-off-by: Jakub Jelen <[email protected]>

sonarqubecloud · 2024-12-23T13:15:01Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Jakuje · 2024-12-23T13:39:18Z

Ok, so now only the ubi is broken (and not because of the pytest-forked, which can be now safely removed). This talks about deprecated setuptools, which I believe should be addressed outside of this PR as it is not related to this issue.

Jakuje force-pushed the crash-exec branch from 63b8ab7 to 5edf773 Compare November 14, 2024 14:27

psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Nov 14, 2024

Jakuje force-pushed the crash-exec branch from 8c72faa to 1d93243 Compare November 15, 2024 08:19

webknjaz reviewed Nov 15, 2024

View reviewed changes

tests/unit/channel_test.py Outdated Show resolved Hide resolved

webknjaz reviewed Nov 15, 2024

View reviewed changes

docs/changelog-fragments/658.bugfix.rst Outdated Show resolved Hide resolved

Jakuje force-pushed the crash-exec branch 3 times, most recently from 1de5bc0 to 251bb6d Compare November 19, 2024 10:12

Jakuje marked this pull request as ready for review November 19, 2024 10:12

Jakuje force-pushed the crash-exec branch from 3cabd51 to 805f0a0 Compare November 19, 2024 10:14

webknjaz reviewed Nov 19, 2024

View reviewed changes

tests/unit/channel_test.py Outdated Show resolved Hide resolved

webknjaz reviewed Nov 19, 2024

View reviewed changes

src/pylibsshext/channel.pyx Outdated Show resolved Hide resolved

webknjaz reviewed Nov 19, 2024

View reviewed changes

tests/unit/channel_test.py Outdated Show resolved Hide resolved

webknjaz reviewed Nov 19, 2024

View reviewed changes

src/pylibsshext/channel.pyx Outdated Show resolved Hide resolved

webknjaz changed the title ~~Prevent crash on consecutive exec_command()~~ Prevent SEGFAULTs on consecutive exec_command() invocations Nov 19, 2024

webknjaz mentioned this pull request Nov 19, 2024

exec_command segfault partial fix #280

Merged

Jakuje force-pushed the crash-exec branch 4 times, most recently from 9bc197c to 9a6e7d6 Compare November 20, 2024 14:05

webknjaz reviewed Nov 20, 2024

View reviewed changes

tests/unit/channel_test.py Outdated Show resolved Hide resolved

webknjaz reviewed Nov 20, 2024

View reviewed changes

tests/unit/channel_test.py Show resolved Hide resolved

Jakuje force-pushed the crash-exec branch 2 times, most recently from a2fcf3c to 9a007ea Compare November 25, 2024 12:47

Jakuje force-pushed the crash-exec branch from ebc990b to 6762007 Compare December 23, 2024 10:03

Jakuje added 12 commits December 23, 2024 14:11

tests: Remove needles sshd options

781f3cd

The Protocol=2 is default and the current implementation does not even support any other protocol version. Using this just makes it confusing to readers Signed-off-by: Jakub Jelen <[email protected]>

tests: Adjusted reproducer for ansible#57

e2adb74

Removes the failed marker as it works now. Signed-off-by: Jakub Jelen <[email protected]>

tests: Check exit code of command

41e315d

Signed-off-by: Jakub Jelen <[email protected]>

tests: Check stderr of exec_command

0e0cd94

Signed-off-by: Jakub Jelen <[email protected]>

tests: Test capturing stderr

f1a39d5

Signed-off-by: Jakub Jelen <[email protected]>

Add changelog

5d66f2c

Signed-off-by: Jakub Jelen <[email protected]>

Jakuje force-pushed the crash-exec branch from 6762007 to 5d66f2c Compare December 23, 2024 13:14

Jakuje mentioned this pull request Dec 23, 2024

Fix pushing large files over SCP #661

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658

Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658

Jakuje commented Nov 14, 2024

packit-as-a-service bot commented Nov 14, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 20, 2024

Jakuje commented Nov 25, 2024

Jakuje commented Dec 22, 2024

Jakuje commented Dec 23, 2024

Jakuje commented Dec 23, 2024

Jakuje commented Dec 23, 2024

Jakuje commented Dec 23, 2024

sonarqubecloud bot commented Dec 23, 2024

Jakuje commented Dec 23, 2024

Prevent SEGFAULTs on consecutive exec_command() invocations #658

Are you sure you want to change the base?

Prevent SEGFAULTs on consecutive exec_command() invocations #658

Conversation

Jakuje commented Nov 14, 2024

SUMMARY

ISSUE TYPE

ADDITIONAL INFORMATION

packit-as-a-service bot commented Nov 14, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 20, 2024

Jakuje commented Nov 25, 2024

Jakuje commented Dec 22, 2024

Jakuje commented Dec 23, 2024

Jakuje commented Dec 23, 2024

Jakuje commented Dec 23, 2024

Jakuje commented Dec 23, 2024

sonarqubecloud bot commented Dec 23, 2024

Quality Gate passed

Jakuje commented Dec 23, 2024

Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658

Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658