Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testprocess: process_testStdinToStdout hanging when run non-interactively #11006

Closed
smcv opened this issue Sep 30, 2024 · 11 comments · Fixed by #11061
Closed

testprocess: process_testStdinToStdout hanging when run non-interactively #11006

smcv opened this issue Sep 30, 2024 · 11 comments · Fixed by #11061
Milestone

Comments

@smcv
Copy link
Contributor

smcv commented Sep 30, 2024

When I run testprocess manually, it all succeeds and is fine, but when I run it from ctest as part of build-time testing in a Debian package, or when I run it as part of ginsttest-runner as an "as-installed" test, it fails.

The last output I see from it is:

21:  09/30/24 14:58:34: �[0;93m----- Test Case 1.6: 'process_testStdinToStdout' started�[0m
21:  09/30/24 14:58:34: Test Description: 'Test writing to stdin and reading from stdout'
21:  09/30/24 14:58:34: Test Iteration 1: execKey 1023637259762064532
21:  09/30/24 14:58:34: Assert 'SDL_CreateProcessWithProperties()': �[0;32mPassed�[0m
21:  09/30/24 14:58:34: Assert 'SDL_GetProcessProperties()': �[0;32mPassed�[0m
21:  09/30/24 14:58:34: Assert 'Checking process ID, expected non-zero, got 17602': �[0;32mPassed�[0m
21:  09/30/24 14:58:34: Assert 'SDL_GetPointerProperty(SDL_PROP_PROCESS_STDIN_POINTER) returns a valid IO stream': �[0;32mPassed�[0m
21:  09/30/24 14:58:34: Assert 'SDL_GetPointerProperty(SDL_PROP_PROCESS_STDOUT_POINTER) returns a valid IO stream': �[0;32mPassed�[0m
21:  09/30/24 14:58:34: Assert 'SDL_GetPointerProperty(SDL_PROP_PROCESS_STDERR_POINTER) returns NULL': �[0;32mPassed�[0m

suggesting that it has got stuck somewhere in the loop where it reads and writes stdin and stdout.

@slouken slouken added this to the 3.2.0 milestone Sep 30, 2024
smcv added a commit to smcv/SDL that referenced this issue Sep 30, 2024
smcv added a commit to smcv/SDL that referenced this issue Sep 30, 2024
The result of the `%` operator will be in the range 0 to 31, so it would
never be 32, but it can be 0.

Helps: libsdl-org#11006
Signed-off-by: Simon McVittie <[email protected]>
smcv added a commit to smcv/SDL that referenced this issue Sep 30, 2024
This will make it more obvious when writing has finished and we are only
waiting for reading.

Helps: libsdl-org#11006
Signed-off-by: Simon McVittie <[email protected]>
smcv added a commit to smcv/SDL that referenced this issue Sep 30, 2024
smcv added a commit to smcv/SDL that referenced this issue Sep 30, 2024
@smcv
Copy link
Contributor Author

smcv commented Sep 30, 2024

#11007 will hopefully help to debug this. It might be too verbose to leave it all applied, I don't know.

@smcv
Copy link
Contributor Author

smcv commented Sep 30, 2024

21:  09/30/24 15:27:48: �[0;93m----- Test Case 1.6: 'process_testStdinToStdout' started�[0m
21:  09/30/24 15:27:48: Test Description: 'Test writing to stdin and reading from stdout'
21:  09/30/24 15:27:48: Test Iteration 1: execKey 6318848106709083947
21:  09/30/24 15:27:48: Assert 'SDL_CreateProcessWithProperties()': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: Assert 'SDL_GetProcessProperties()': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: Assert 'Checking process ID, expected non-zero, got 17627': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: Assert 'SDL_GetPointerProperty(SDL_PROP_PROCESS_STDIN_POINTER) returns a valid IO stream': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: Assert 'SDL_GetPointerProperty(SDL_PROP_PROCESS_STDOUT_POINTER) returns a valid IO stream': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: Assert 'SDL_GetPointerProperty(SDL_PROP_PROCESS_STDERR_POINTER) returns NULL': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (0th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (0th time), total written 65536
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (0th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 0 (0th time), total read 0
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (1th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 4096 (1th time), total written 69632
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (1th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 0 (1th time), total read 0
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (2th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 4096 (2th time), total written 73728
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (2th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 0 (2th time), total read 0
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (3th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (3th time), total written 139264
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (3th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (3th time), total read 65504
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (4th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (4th time), total written 204800
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (4th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (4th time), total read 131008
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (5th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 81920 (5th time), total written 286720
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (5th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (5th time), total read 196512
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (6th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (6th time), total written 352256
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (6th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (6th time), total read 262016
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (7th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (7th time), total written 417792
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (7th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (7th time), total read 327520
...
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (8th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (8th time), total written 483328
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (8th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (8th time), total read 393024
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (9th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (9th time), total written 548864
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (9th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (9th time), total read 458528
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (10th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (10th time), total written 614400
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (10th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (10th time), total read 524032
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (11th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (11th time), total written 679936
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (11th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (11th time), total read 589536
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (12th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (12th time), total written 745472
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (12th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (12th time), total read 655040
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (13th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (13th time), total written 811008
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (13th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (13th time), total read 720544
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (14th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (14th time), total written 876544
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (14th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (14th time), total read 786048
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (15th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (15th time), total written 942080
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (15th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (15th time), total read 851552
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (16th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 65536 (16th time), total written 1007616
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (16th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (16th time), total read 917056
21:  09/30/24 15:27:48: Assert 'About to SDL_WriteIO (17th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_WriteIO() -> 40960 (17th time), total written 1048576
21:  09/30/24 15:27:48: All data written to stdin
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (17th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 65504 (17th time), total read 982560
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (18th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 49632 (18th time), total read 1032192
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (19th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 0 (19th time), total read 1032192
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (20th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 0 (20th time), total read 1032192
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (21th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 0 (21th time), total read 1032192
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (22th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 0 (22th time), total read 1032192
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (23th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 0 (23th time), total read 1032192
21:  09/30/24 15:27:48: Assert 'About to SDL_ReadIO (24th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:48: SDL_ReadIO() -> 0 (24th time), total read 1032192

… and then nothing more is read until:

21:  09/30/24 15:27:58: Assert 'About to SDL_ReadIO (937th time)': �[0;32mPassed�[0m
21:  09/30/24 15:27:58: SDL_ReadIO() -> 0 (937th time), total read 1032192
21/23 Test #21: testprocess ......................***Timeout  10.02 sec

@smcv
Copy link
Contributor Author

smcv commented Sep 30, 2024

One thing that I notice is that the stdio-based read/write interface does this:

static size_t SDLCALL stdio_read(void *userdata, void *ptr, size_t size, SDL_IOStatus *status)
{
    IOStreamStdioData *iodata = (IOStreamStdioData *) userdata;
    const size_t bytes = fread(ptr, 1, size, iodata->fp);
    if (bytes == 0 && ferror(iodata->fp)) {
        if (errno == EAGAIN) {
            *status = SDL_IO_STATUS_NOT_READY;
            clearerr(iodata->fp);
        } else {
            SDL_SetError("Error reading from datastream: %s", strerror(errno));
        }
    }
    return bytes;
}

static size_t SDLCALL stdio_write(void *userdata, const void *ptr, size_t size, SDL_IOStatus *status)
{
    IOStreamStdioData *iodata = (IOStreamStdioData *) userdata;
    const size_t bytes = fwrite(ptr, 1, size, iodata->fp);
    if (bytes == 0 && ferror(iodata->fp)) {
        if (errno == EAGAIN) {
            *status = SDL_IO_STATUS_NOT_READY;
            clearerr(iodata->fp);
        } else {
            SDL_SetError("Error writing to datastream: %s", strerror(errno));
        }
    }
    return bytes;
}

But on my Linux system, the ferror() man page explicitly documents "These functions should not fail and do not set errno" (my emphasis), so there's no guarantee that errno will be meaningful; and the fread(), fwrite() man pages do not document whether they set errno or not.

Perhaps on Unix systems it would be better to have a file-descriptor-backed SDL_IOStream implementation that uses read() and write() directly, and use that when dealing with pipes?

@madebr
Copy link
Contributor

madebr commented Sep 30, 2024

In 7241dd9, I changed childprocess.c from character based to memory block based io: it's faster.
Its stdin is also made non-blocking (otherwise a buffer size in the child process that is not a multiples of the page size would cause a blocking wait on more data).

Does reverting childprocess to character based io make any difference?

@smcv
Copy link
Contributor Author

smcv commented Sep 30, 2024

After hacking in some code to close the stdin fd as soon as we have written everything out to the stdout fd, it looks as though a block of 8192 bytes from the middle of the input has got lost from the output, with everything after that point shifted 8192 bytes to fill the gap, and zeroes at the end of the buffer. I don't know which stage this is a problem with:

test write to stdin -> childprocess input -> childprocess output -> test read from stdout

(Unfortunately the only reliable reproducer I have for this is noninteractive.)

If I can't see an obvious problem from another look at the code, I'll see what happens if I make childprocess go back to doing I/O a byte at a time.

@smcv
Copy link
Contributor Author

smcv commented Sep 30, 2024

I changed childprocess.c from character based to memory block based io: it's faster

Is it definitely faster in practice, or just theoretically faster? (I ask because fgetc and fputc are buffered I/O, so they are still being buffered at the libc level - although the use of fflush defeats that on the write side)

Its stdin is also made non-blocking

Really? On POSIX, it looks to me as though the changes in that commit are making it blocking (the O_NONBLOCK bit is being turned off).

@madebr
Copy link
Contributor

madebr commented Sep 30, 2024

I changed childprocess.c from character based to memory block based io: it's faster

Is it definitely faster in practice, or just theoretically faster? (I ask because fgetc and fputc are buffered I/O, so they are still being buffered at the libc level - although the use of fflush defeats that on the write side)

0.2s vs 0.7s on my system.

Its stdin is also made non-blocking

Really? On POSIX, it looks to me as though the changes in that commit are making it blocking (the O_NONBLOCK bit is being turned off).

Oh damn, you're right.
Changing it to set the O_NONBLOCK bit, SDLTest_CompareMemory fails for me.

@smcv
Copy link
Contributor Author

smcv commented Sep 30, 2024

I'm not sure whether the result of combining buffered I/O with O_NONBLOCK is even well-defined... O_NONBLOCK is more of a read()/write()-level thing.

@slouken
Copy link
Collaborator

slouken commented Oct 2, 2024

It sounds like we do want read/write based I/O here. I was trying to be clever and switched from file descriptor to FILE implementation, but maybe we should switch it back.

slouken added a commit to slouken/SDL that referenced this issue Oct 4, 2024
slouken added a commit to slouken/SDL that referenced this issue Oct 4, 2024
slouken added a commit to slouken/SDL that referenced this issue Oct 4, 2024
slouken added a commit to slouken/SDL that referenced this issue Oct 4, 2024
slouken added a commit to slouken/SDL that referenced this issue Oct 4, 2024
Use low level non-blocking I/O for process pipe streams. This may fix issues with data not getting through the pipe occasionally.

Related: libsdl-org#11006
slouken added a commit that referenced this issue Oct 4, 2024
Use low level non-blocking I/O for process pipe streams. This may fix issues with data not getting through the pipe occasionally.

Related: #11006
@smcv
Copy link
Contributor Author

smcv commented Oct 4, 2024

On Unix, I think childprocess.c should either use the default blocking I/O together with fread and fwrite (which in fact it does, but possibly only by mistake, because it's clearing a flag in code that maybe meant to set it), or it should use non-blocking I/O with read and write. I would personally keep it simple and use blocking I/O in childprocess.c.

There are several layers going on here. In Unix world, the layers go like this:

A pipe is basically some memory allocated inside the kernel, with file descriptors to talk to its two ends.

read() and write() are direct system calls into the kernel, and they read and write directly between the memory buffer inside the kernel and the caller's memory, without any additional buffering in user-space. The O_NONBLOCK flag is a kernel thing, which changes what happens if we try to read() when the kernel's buffer is empty, or if we try to write() when the kernel's buffer is full. By default, the syscall blocks until either EOF is reached, or there is something that it can do. If we set O_NONBLOCK, any time it would otherwise have blocked, instead it fails with EAGAIN and we have to retry later.

In a better kernel design, whether to block would be a flag that we pass to the syscall, like waitpid with its WNOHANG flag, rather than a piece of out-of-band state on the fd - but it's easy to say that with the benefit of hindsight, which was not available when this stuff was designed in the 1980s.

stdio (fread() and fwrite()) adds an extra layer of buffering inside the application, to make reads and writes "cheaper" by combining several application-layer reads and writes into a single syscall: for example each fread() reads from a cache of data that was already read, and the C library only does a new read() if the cache is empty. However, if we're using stdio, then it's important to use stdio consistently: if we go behind its back, like if we mix fread() with read(), then we can end up seeing incoming data in a mixed-up order, because the read() can give us newer data than what's already cached in the stdio buffer.

The combinations are:

  • blocking and direct read()/write(): OK, very simple
  • non-blocking and direct read()/write(): OK, simple (slightly different event loop design)
  • blocking and stdio: OK, extra layer of latency but often better throughput
  • non-blocking and stdio: maybe not OK? I am not sure what the interaction between O_NONBLOCK and stdio is, and we should probably try not to have to find out!

If you are using stdio, then setvbuf() and similar functions adjust the user-space buffering behaviour (nothing to do with the kernel!), either unbuffered (basically turns stdio into read()/write() but with extra steps) or line-buffered (try to buffer until a newline is seen) or fully buffered (what you want for e.g. high-throughput copying of binaries). By default, stderr is unbuffered (so that you see diagnostic information that might be relevant to a crash as soon as possible), and stdin/stdout are either line-buffered if they point to a terminal (for user friendliness), or fully buffered otherwise (for throughput).

On Windows, childprocess.c is using SetConsoleMode. I don't think this is actually equivalent to O_NONBLOCK! I think it's more like the equivalent of using setvbuf to set stdio to be unbuffered (_IONBF).

On Unix, childprocess.c is explicitly turning off O_NONBLOCK, which might be redundant. If it was intended to be turning on O_NONBLOCK, then I think it should certainly not do that, because the copying loop is using stdio.

@slouken
Copy link
Collaborator

slouken commented Oct 4, 2024

I think the intent is for the child process to use stdio and flush the output. I’ll double check that when I get back to my desk.

klukaszek pushed a commit to klukaszek/SDL that referenced this issue Nov 17, 2024
Use low level non-blocking I/O for process pipe streams. This may fix issues with data not getting through the pipe occasionally.

Related: libsdl-org#11006
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants