Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with long timeouts in TCP connect() #210

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

MarkRivers
Copy link
Member

This PR sets the sockopt SO_SNDTIMEO to the time set by pasynManager->setAutoConnectTimeout() during the call to connect(). This allows must faster timeouts than the system default. It has been tested on Linux and Windows.

@AppVeyorBot
Copy link

Build asyn 1.0.272 failed (commit 770913eff7 by @MarkRivers)

@@ -520,6 +532,9 @@ connectIt(void *drvPvt, asynUser *pasynUser)
tty->flags |= FLAG_NEED_LOOKUP;
return asynError;
}
if (setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, (char *)&saveTV, sizeof saveTV) < 0) {
asynPrint(pasynUser, ASYN_TRACE_ERROR, "connectIt, error calling setsockopt for SO_RECVTIMEO: %s\n", strerror(SOCKERRNO));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 3 error messages are using the wrong option name (RECV instead of SND).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed that, thanks for catching it.

It does not appear to be supported for connect() on Windows, and takes a
different argument.
Note sure about vxWorks and RTEMS, they could be added later.
Fix typo in error messages.
@MarkRivers
Copy link
Member Author

I read the documentation for SO_SNDTIMEO on Windows.

  • It takes a DWORD value, not timeval like Linux.
  • There is no mention that it applies to connect() calls.

I have restricted the use of SO_SNDTIMEO to Linux. RTEMS, vxWorks, and Darwin could be added if it is implemented on them.

@ericonr
Copy link
Contributor

ericonr commented Nov 6, 2024

If we want a solution that could work on more platforms, why not use poll? Like https://stackoverflow.com/a/61960339 does it (I don't know if the signal handling is even necessary for asyn, which would greatly simplify that code).

Especially since Linux always supports poll, it would be a single code path for most platforms.

@anjohnson
Copy link
Member

If we want a solution that could work on more platforms, why not use poll?

Especially since Linux always supports poll, it would be a single code path for most platforms.

The poll() API isn't available on all the supported EPICS platforms, in particular RTEMS and VxWorks, many EPICS users are still running Asyn on them. The PVXS implementation of PVA is based on libevent-2 which could be a candidate in the future, it runs on RTEMS and we'll be dropping VxWorks in EPICS 7.1.

Copy link
Member

@anjohnson anjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. I assume you'll be documenting the additional asynManager method before release.

@MarkRivers
Copy link
Member Author

I am thinking about Darwin. This document
https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/setsockopt.2.html
says:

     SO_SNDTIMEO is an option to set a timeout value for output operations.
     It accepts a struct timeval parameter with the number of seconds and
     microseconds used to limit waits for output operations to complete.  If a
     send operation has blocked for this much time, it returns with a partial
     count or with the error EWOULDBLOCK if no data were sent.  In the current
     implementation, this timer is restarted each time additional data are
     delivered to the protocol, implying that the limit applies to output por-tions portions
     tions ranging in size from the low-water mark to the high-water mark for
     output.

It is not clear to me whether or not this applies to connect(). I think I will enable it for darwin. Can anyone test?

@anjohnson
Copy link
Member

I think I will enable it for darwin. Can anyone test?

I can run tests on Darwin-x86 and/or Darwin-aarch64 if you can tell me exactly what needs doing.

You could also add a MacOS entry to the .github/workflows/ci-scripts.yml file if you were to implement those steps as a test program.

BTW there seems to be a duplicate job defined in the ci-scripts.yml job matrix, the second and third jobs there are identical.

@ericonr
Copy link
Contributor

ericonr commented Nov 6, 2024

The poll() API isn't available on all the supported EPICS platforms, in particular RTEMS and VxWorks, many EPICS users are still running Asyn on them.

@anjohnson yes, but this function already uses #ifdef USE_POLL elsewhere, and the timeout application is conditional anyhow...

Right now we have linux and (possibly) MacOS support. If we use poll(), we don't need to encode platform knowledge anywhere, we can simply support any platform with poll (which is a superset of linux+MacOS, even if it doesn't include RTEMS/VxWorks).

And since the #ifdef USE_POLL code block is for setting the socket to nonblocking, it could simply be moved upwards in the function and also perform a connection with timeout application.

@MarkRivers
Copy link
Member Author

@ericonr I would be happy to look at an alternative PR that is tested on Linux, Windows, and Darwin.

@ericonr
Copy link
Contributor

ericonr commented Nov 6, 2024

@MarkRivers @anjohnson would you mind taking a look at the latest commit in https://github.com/lnls-dig/asyn/commits/nonblocking-connect-poll/ ?

I don't have an easily available Windows machine to test (much less MacOS), but if you are okay with the approach there, I can set up a Windows environment in order to test it there.

@AppVeyorBot
Copy link

Build asyn 1.0.275 failed (commit f087da5a63 by @MarkRivers)

@mdavidsaver
Copy link
Contributor

... look at the latest commit ...

@ericonr You probably want to put a loop around poll() to handle a -1 return with errno==EINTR.

... tested on Linux, Windows, and Darwin.
... I don't have an easily available Windows machine ...

Do any of the unit tests cover IPPort? If so, then maybe this is already covered by a CI build.

@AppVeyorBot
Copy link

Build asyn 1.0.277 failed (commit 8059e41b78 by @MarkRivers)

@AppVeyorBot
Copy link

Build asyn 1.0.279 failed (commit 81ad3bbcdf by @MarkRivers)

@AppVeyorBot
Copy link

Build asyn 1.0.281 failed (commit 3f2175fb92 by @MarkRivers)

@ericonr
Copy link
Contributor

ericonr commented Nov 13, 2024

@ericonr You probably want to put a loop around poll() to handle a -1 return with errno==EINTR.

@mdavidsaver That would require using considerably more code to sleep for the correct amount of time, to support an abnormal case (if this connection is interrupted, it can simply be tried again). Furthermore, the version here, which uses blocking connect with a timeout, is subject to being interrupted by signals as well, no?

@mdavidsaver
Copy link
Contributor

@ericonr To avoid confusion, it might be a good idea to move this discussion to a draft PR for your alternate change.

... blocking connect with a timeout, is subject to being interrupted by signals as well, no?

You make a fair point. I suppose looping could be omitted so long as the poll()==-1 case is treated as an error, which I think is the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

asynOctetSetOutputEos and asynOctetSetInputEos block IOC init and exit if no device connected
6 participants