-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getaddrinfo
fails on Illumos, inconsistent elsewhere if port is, but type & protocol are not specified
#123832
Comments
PoC--- socket.py.orig Tue Mar 26 20:30:27 2024
+++ socket.py Sun Sep 8 07:37:32 2024
@@ -948,6 +948,11 @@
narrow the list of addresses returned. Passing zero as a value for each of
these arguments selects the full range of results.
"""
+ # Provide consistent behavior across UNIX platforms (see issue #123832)
+ if isinstance(port, int) and type == 0 and proto == 0:
+ return getaddrinfo(host, port, family, SOCK_DGRAM, proto, flags) + \
+ getaddrinfo(host, port, family, SOCK_STREAM, proto, flags)
+
# We override this function since we want to translate the numeric family
# and socket type values to enum constants.
addrlist = []
|
I have just checked AIX 7.1 and AIX 7.3, and my speculation that they also exhibit inconsistent behavior is confirmed. In the case of AIX, if the socket type is not specified, it doesn't return all supported socket types, but only -bash-5.1$ python3
Python 3.9.19 (main, Apr 5 2024, 05:08:51)
[GCC 10.3.0] on aix
Type "help", "copyright", "credits" or "license" for more information.
>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000))]
>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000))] I guess this supports my idea of just calling the function twice at the |
Just checked on Linux, and also returns RAW in addition to STREAM & DGRAM: zaytsev@fedora:~$ python3
Python 3.12.4 (main, Jun 7 2024, 00:00:00) [GCC 14.1.1 20240607 (Red Hat 14.1.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_RAW: 3>, 0, '', ('127.0.0.1', 5000))] So even the behavior of two major platforms (Linux and macOS) is inconsistent. |
socket.getaddrinfo
on Solaris fails if service name (port) is specified, but socket type / protocol unspecifiedgetaddrinfo
fails on Solaris, inconsistent on Linux, macOS if port is, but type & protocol are not specified
Can you check whether the behaviour is still inconsistent if you use the same Python version (AFAICT, one is 3.9 for Solaris and the other is 3.12)? |
Yes, I can confirm that Python 3.9 behaves inconsistently on all platforms:
I can easily confirm that this behavior persists in Python 3.12 on Linux and macOS, but it will take me a huge effort to build Python 3.12 on AIX and Solaris from scratch to confirm that it's the same. However, I don't think it's absolutely necessary, because to my knowledge no relevant changes happened between 3.9 and 3.12, and for 3.12 the behavior remains inconsistent for at least 2 of the 4 platforms in question. |
I don't know if it's inconsistent or not actually. The docstring tells me:
Is RAW supported on macOS in general? On the other hand, we need to dig into https://github.com/python/cpython/blob/main/Modules/getaddrinfo.c to see how (and possibly why) the RAW family is not found. |
Well, throwing an exception on Solaris and returning only If you define consistency as each platform having its own interpretation of what "the full range of results" actually is, then yes, it might be consistent :)
I guess not, because when I explicitly request % python3.9
Python 3.9.20 (main, Sep 6 2024, 19:03:56)
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_RAW)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000))] Perhaps another argument to iterate over |
Yes that's what I had in mind. I think we rely on the libc implementation of
This in particular seems a bug if you can use
We'll need to investigate whether it's what is being returned by the OS or if we incorrectly fetch the results. I am not very familiar with this part of the code and cannot test it so someone more qualified will need to do it. |
My understanding is that The Python interface, however, has only two mandatory parameters (where the service name is actually not very mandatory, as you can pass So if that's still the goal, I don't see a way around enumerating socket types to provide a consistent interface for high-level users. As you can see from the reaction of the Flask-SocketIO maintainer, he doesn't see a problem with doing The Gevent people were bitten by the same bug and ended up adding hacks to their code: gevent/gevent#1256. Eventlet is still broken on Solaris and AIX. |
Hi, I am sorry I didn't react earlier, I completely missed the notification. I tested this on Oracle Solaris, and it differs from OpenIndiana...:
It behaves the same on 3.9.19, 3.11.9 and 3.13.0 (all runtimes we currently ship). There is no exception, but it still looks different from other platforms... |
getaddrinfo
fails on Solaris, inconsistent on Linux, macOS if port is, but type & protocol are not specifiedgetaddrinfo
fails on Illumos, inconsistent elsewhere if port is, but type & protocol are not specified
Hi @kulikjak , thanks for your feedback! In the meantime, I was able to get Solaris 11.4 in a virtual machine and cannot confirm your findings. For me it's the same as Illumos. Is my VM outdated and the problem was somehow fixed in the latest version that you are using, or how should I understand these results? Oracle Solaris 11.4.42.111.0 Assembled December 2021
root@solaris:~# python3
Python 3.7.10 (default, Nov 19 2021, 12:42:39)
[GCC 10.3.0] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.getaddrinfo("127.0.0.1", 5000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 9] service name not available for the specified socket type
>>> socket.getaddrinfo("127.0.0.1", 5000, family=socket.AF_INET)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 9] service name not available for the specified socket type
>>> socket.getaddrinfo("127.0.0.1", None)
[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 0))]
>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]
>>> socket.getaddrinfo("127.0.0.1", 5000, proto=socket.IPPROTO_TCP)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))] Anyways, sorry about transferring what I learned from OpenIndiana directly to Solaris. Sadly, since the Sun doesn't shine, Oracle doesn't make it easy for porters to get their hands on the "real" Solaris, so much work usually happens "by proxy" on Illumos / OpenIndiana. I've reported a bug against Illumos, let's see what they say. https://www.illumos.org/issues/16851 Still, irrespectively of Solaris behavior, I think the issue for Python is very relevant: if the semantics of the standard library were meant to provide a useful and consistent behavior without specifying the socket type, then it seems that there is no way around adding a loop to achieve meaningful results. |
Oh, well that's interesting. I guess something changed since then? I will investigate.
I know, it's nowhere near ideal... Looking at the version you tested it with, I presume you downloaded (now quite old) the first CBE release we did almost two years ago. I cannot promise anything, but I can share that we would really like to release another one pretty soon, and then hopefully keep doing so more often.
+1 |
So, I found out that the behavior was indeed changed in SRU63 (released last November); before that, it behaved as OpenIndiana. Apparently, we also encountered the same exceptions but decided that even though it's within the standard, we should adjust the behavior to mirror other systems more closely. |
That explains it! It seems out that Illumos people found the same issue independently 9 months ago and see it as a bug, but didn't fix it so far: https://www.illumos.org/issues/16273
Well, the problem of not being able to run Python-based web servers of Solaris out of the box is a practical one, so this decision makes sense.
Yes, I took the VM by https://github.com/vmactions - everything else was way too difficult. Thanks for your activities, hopefully they will help other people in the future. |
I'm not the expert (that would be @gpshead), so It's likely that I missed something, but it does look like an OS bug. The standard I could get my hands on, POSIX.1-2017, sounds unambiguous:
(Python maps host/port to nodename/servname, and unspecified family to Our official policy for unsupported platforms is ambiguous, but it looks like the workaround here is too big to accept, let alone backport to a bugfix branch. We won't test it in CI, after all. If you'd like to contribute a test, to codify what the intended behaviour is and possibly alert other platform maintainers about the issue, that'd be very nice. Does that sound fair? |
@encukou, just to confirm, your reading of POSIX is that
@kulikjak, since your team analyzed the standard when making the change for SRU63, can you say something as to why you are still not returning My reading of the standard is...
... that |
Ah, I see the ambigous bit in the standard now. Thanks! What about a docs update like this one? It includes wording similar to the POSIX's providing options and by limiting the returned information, which IMO suggests that the hints indeed limit the resulting list compared to the defaults, but can be interpreted differently. It specifically says that it wraps the underlying C function, so, the details are in OS docs. The “full range of results” bit goes away. It uses It suggests setting proto to It suggest that the results should be tried in order, which is, AFAIK best practice -- see RFC 6724 section 2 and its predecessor from 2003 (which are specific to IP, but indicate how people use this):
Does this look reasonable? (If so I'll open a PR, hold any nitpicks/typos until then.) --- a/Doc/library/socket.rst
+++ b/Doc/library/socket.rst
@@ -928,7 +928,7 @@ The :mod:`socket` module also offers various network-related services:
.. versionadded:: 3.7
-.. function:: getaddrinfo(host, port, family=0, type=0, proto=0, flags=0)
+.. function:: getaddrinfo(host, port, family=AF_UNSPEC, type=0, proto=0, flags=0)
Translate the *host*/*port* argument into a sequence of 5-tuples that contain
all the necessary arguments for creating a socket connected to that service.
@@ -938,8 +938,9 @@ The :mod:`socket` module also offers various network-related services:
and *port*, you can pass ``NULL`` to the underlying C API.
The *family*, *type* and *proto* arguments can be optionally specified
- in order to narrow the list of addresses returned. Passing zero as a
- value for each of these arguments selects the full range of results.
+ in order to provide options and limit the list of addresses returned.
+ Pass :data:`AF_UNSPEC` as a value for *family*, and/or zero as a value for
+ *type* or *proto*, to not limit the results.
The *flags* argument can be one or several of the ``AI_*`` constants,
and will influence how results are computed and returned.
For example, :const:`AI_NUMERICHOST` will disable domain name resolution
@@ -959,6 +960,20 @@ The :mod:`socket` module also offers various network-related services:
:const:`AF_INET6`), and is meant to be passed to the :meth:`socket.connect`
method.
+ .. note::
+
+ This function wraps the C function ``getaddrinfo`` of the underlying
+ system.
+
+ With default values of *family*, *type*, *proto* and/or *flags*,
+ many systems will return a sorted list of all matching addresses,
+ which should generally be tried in order until a connection succeeds
+ (possibly in parallel, for example using a `Happy Eyeballs`_ algorithm).
+
+ Some systems will, however, only return a single address.
+ To ensure this address is a usable one, limit the options: for example,
+ set *type* to :const:`SOCK_STREAM` or :const:`SOCK_DGRAM`.
+
.. audit-event:: socket.getaddrinfo host,port,family,type,protocol socket.getaddrinfo
The following example fetches address information for a hypothetical TCP
@@ -978,6 +993,8 @@ The :mod:`socket` module also offers various network-related services:
for IPv6 multicast addresses, string representing an address will not
contain ``%scope_id`` part.
+.. _Happy Eyeballs: https://en.wikipedia.org/wiki/Happy_Eyeballs
+
.. function:: getfqdn([name])
Return a fully qualified domain name for *name*. If *name* is omitted or empty, |
I like all these points. I think it really helps (at least it would have helped me before I opened this issue). I'm not entirely happy with the changes to the introduction & this part though:
My advice would be to always specify socket type and/or protocol, unless you actually don't plan to use the information returned, but rather check resolution or something. The reason for this is that, as we have seen, mainstream systems (and AIX) return many results, where the first one may not actually be a usable one, and Solaris / Illumos will return something if they don't crash, but you can't use that to open a socket. I've linked a couple of libraries in this issue, and what they mostly do is open a TCP socket, but the documentation leads the authors to believe that omitting all parameters and picking the first entry is the right thing to do. We can argue about whether it's the job of Python documentation to teach users about networking, but if it's no big deal, I'd try to write it in a way that spreads the sacred knowledge and reinforces best practices. To be specific, I would suggest something like this:
|
…mpliance This changes nothing changes for CPython supported platforms, but hints how to deal with platforms that stick to the letter of the spec. It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`; specifically, workarounds to make the function work consistently across platforms are out of scope in its code. Include wording similar to the POSIX's “by providing options and by limiting the returned information”, which IMO suggests that the hints limit the resulting list compared to the defaults, *but* can be interpreted differently. Details are added in a note. Specifically say that this wraps the underlying C function. So, the details are in OS docs. The “full range of results” bit goes away. Use `AF_UNSPEC` rather than zero for the *family* default, although I don't think a system where it's nonzero would be very usable. Suggest setting proto and/or type (with examples, as the appropriate values aren't obvious). Say why you probably want to do that that on all systems; mention the behavior on the “letter of the spec” systems. Suggest that the results should be tried in order, which is, AFAIK best practice -- see RFC 6724 section 2, and its predecessor from 2003 (which are specific to IP, but indicate how people use this): > Well-behaved applications SHOULD iterate through the list of > addresses returned from `getaddrinfo()` until they find a working address.
Thanks! |
…ce (GH-126182) * gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance This changes nothing changes for CPython supported platforms, but hints how to deal with platforms that stick to the letter of the spec. It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`; specifically, workarounds to make the function work consistently across platforms are out of scope in its code. Include wording similar to the POSIX's “by providing options and by limiting the returned information”, which IMO suggests that the hints limit the resulting list compared to the defaults, *but* can be interpreted differently. Details are added in a note. Specifically say that this wraps the underlying C function. So, the details are in OS docs. The “full range of results” bit goes away. Use `AF_UNSPEC` rather than zero for the *family* default, although I don't think a system where it's nonzero would be very usable. Suggest setting proto and/or type (with examples, as the appropriate values aren't obvious). Say why you probably want to do that that on all systems; mention the behavior on the “letter of the spec” systems. Suggest that the results should be tried in order, which is, AFAIK best practice -- see RFC 6724 section 2, and its predecessor from 2003 (which are specific to IP, but indicate how people use this): > Well-behaved applications SHOULD iterate through the list of > addresses returned from `getaddrinfo()` until they find a working address. Co-authored-by: Carol Willing <[email protected]>
…mpliance (pythonGH-126182) * pythongh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance This changes nothing changes for CPython supported platforms, but hints how to deal with platforms that stick to the letter of the spec. It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`; specifically, workarounds to make the function work consistently across platforms are out of scope in its code. Include wording similar to the POSIX's “by providing options and by limiting the returned information”, which IMO suggests that the hints limit the resulting list compared to the defaults, *but* can be interpreted differently. Details are added in a note. Specifically say that this wraps the underlying C function. So, the details are in OS docs. The “full range of results” bit goes away. Use `AF_UNSPEC` rather than zero for the *family* default, although I don't think a system where it's nonzero would be very usable. Suggest setting proto and/or type (with examples, as the appropriate values aren't obvious). Say why you probably want to do that that on all systems; mention the behavior on the “letter of the spec” systems. Suggest that the results should be tried in order, which is, AFAIK best practice -- see RFC 6724 section 2, and its predecessor from 2003 (which are specific to IP, but indicate how people use this): > Well-behaved applications SHOULD iterate through the list of > addresses returned from `getaddrinfo()` until they find a working address. (cherry picked from commit ff0ef0a) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Carol Willing <[email protected]>
…mpliance (pythonGH-126182) * pythongh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance This changes nothing changes for CPython supported platforms, but hints how to deal with platforms that stick to the letter of the spec. It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`; specifically, workarounds to make the function work consistently across platforms are out of scope in its code. Include wording similar to the POSIX's “by providing options and by limiting the returned information”, which IMO suggests that the hints limit the resulting list compared to the defaults, *but* can be interpreted differently. Details are added in a note. Specifically say that this wraps the underlying C function. So, the details are in OS docs. The “full range of results” bit goes away. Use `AF_UNSPEC` rather than zero for the *family* default, although I don't think a system where it's nonzero would be very usable. Suggest setting proto and/or type (with examples, as the appropriate values aren't obvious). Say why you probably want to do that that on all systems; mention the behavior on the “letter of the spec” systems. Suggest that the results should be tried in order, which is, AFAIK best practice -- see RFC 6724 section 2, and its predecessor from 2003 (which are specific to IP, but indicate how people use this): > Well-behaved applications SHOULD iterate through the list of > addresses returned from `getaddrinfo()` until they find a working address. (cherry picked from commit ff0ef0a) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Carol Willing <[email protected]>
…ompliance (GH-126182) (GH-126825) gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance (GH-126182) * gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance This changes nothing changes for CPython supported platforms, but hints how to deal with platforms that stick to the letter of the spec. It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`; specifically, workarounds to make the function work consistently across platforms are out of scope in its code. Include wording similar to the POSIX's “by providing options and by limiting the returned information”, which IMO suggests that the hints limit the resulting list compared to the defaults, *but* can be interpreted differently. Details are added in a note. Specifically say that this wraps the underlying C function. So, the details are in OS docs. The “full range of results” bit goes away. Use `AF_UNSPEC` rather than zero for the *family* default, although I don't think a system where it's nonzero would be very usable. Suggest setting proto and/or type (with examples, as the appropriate values aren't obvious). Say why you probably want to do that that on all systems; mention the behavior on the “letter of the spec” systems. Suggest that the results should be tried in order, which is, AFAIK best practice -- see RFC 6724 section 2, and its predecessor from 2003 (which are specific to IP, but indicate how people use this): > Well-behaved applications SHOULD iterate through the list of > addresses returned from `getaddrinfo()` until they find a working address. (cherry picked from commit ff0ef0a) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Carol Willing <[email protected]>
…ompliance (GH-126182) (GH-126824) gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance (GH-126182) * gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance This changes nothing changes for CPython supported platforms, but hints how to deal with platforms that stick to the letter of the spec. It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`; specifically, workarounds to make the function work consistently across platforms are out of scope in its code. Include wording similar to the POSIX's “by providing options and by limiting the returned information”, which IMO suggests that the hints limit the resulting list compared to the defaults, *but* can be interpreted differently. Details are added in a note. Specifically say that this wraps the underlying C function. So, the details are in OS docs. The “full range of results” bit goes away. Use `AF_UNSPEC` rather than zero for the *family* default, although I don't think a system where it's nonzero would be very usable. Suggest setting proto and/or type (with examples, as the appropriate values aren't obvious). Say why you probably want to do that that on all systems; mention the behavior on the “letter of the spec” systems. Suggest that the results should be tried in order, which is, AFAIK best practice -- see RFC 6724 section 2, and its predecessor from 2003 (which are specific to IP, but indicate how people use this): > Well-behaved applications SHOULD iterate through the list of > addresses returned from `getaddrinfo()` until they find a working address. (cherry picked from commit ff0ef0a) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Carol Willing <[email protected]>
Bug report
Bug description:
I have discovered that
SolarisOpenIndiana, if you don't specify at least socket type or protocol for the service name, address resolution fails (see below).I've filed a PR to fix the problem at the call site of the software that exposed it (miguelgrinberg/Flask-SocketIO#2088), but the maintainer argues that it's a bug in CPython and is unwilling to accept a patch that specifies the protocol.
His position that Python's high-level APIs should provide consistent behavior across platforms is understandable, although one could argue that
getaddrinfo("127.0.0.1", 5000)
is a borderline invalid use of the API. However, the documentation as it stands suggests thatNone
can be passed, but doesn't explain why you would want to do so, and the paragraph below suggests that keyword arguments act as AND filters and can be safely omitted.Unfortunately, I don't see any fantastic way to fix this in CPython that would provide a consistent interface across platforms. I think we could do a
configure
check to detect thisgetaddrinfo
behavior (might also be the case on AIX, Tru64, etc., although it's just a speculation) and then call the C function twice at Python level with (type=socket.SOCK_STREAM
andtype=socket.SOCK_DGRAM
) to merge the results (which is already done for another reason):cpython/Lib/socket.py
Line 964 in beee91c
Or, if you don't mind, we can just always do it where protocol AND socket type are unspecified, but port is specified and is numeric. This is very simple, the overhead (I think) is negligible, and it is consistent across platforms.
I would appreciate an opinion from Python developers and/or Solaris engineers before I put more work into this.
OpenIndiana
macOS
AIX
Solaris
FreeBSD 14.1
Linux (Fedora)
Summary
/cc @kulikjak
CPython versions tested on:
3.9
Operating systems tested on:
Other
Linked PRs
socket.getaddrinfo
docs for better POSIX compliance #126182socket.getaddrinfo
docs for better POSIX compliance (GH-126182) #126824socket.getaddrinfo
docs for better POSIX compliance (GH-126182) #126825The text was updated successfully, but these errors were encountered: