Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getaddrinfo fails on Illumos, inconsistent elsewhere if port is, but type & protocol are not specified #123832

Closed
zyv opened this issue Sep 8, 2024 · 19 comments
Labels
docs Documentation in the Doc dir extension-modules C modules in the Modules dir

Comments

@zyv
Copy link

zyv commented Sep 8, 2024

Bug report

Bug description:

I have discovered that

  1. On Solaris OpenIndiana, if you don't specify at least socket type or protocol for the service name, address resolution fails (see below).
  2. Linux & MacOS happily ignore this and continue with the resolution anyway.
  3. Specifying the socket family doesn't help.

I've filed a PR to fix the problem at the call site of the software that exposed it (miguelgrinberg/Flask-SocketIO#2088), but the maintainer argues that it's a bug in CPython and is unwilling to accept a patch that specifies the protocol.

His position that Python's high-level APIs should provide consistent behavior across platforms is understandable, although one could argue that getaddrinfo("127.0.0.1", 5000) is a borderline invalid use of the API. However, the documentation as it stands suggests that None can be passed, but doesn't explain why you would want to do so, and the paragraph below suggests that keyword arguments act as AND filters and can be safely omitted.

Unfortunately, I don't see any fantastic way to fix this in CPython that would provide a consistent interface across platforms. I think we could do a configure check to detect this getaddrinfo behavior (might also be the case on AIX, Tru64, etc., although it's just a speculation) and then call the C function twice at Python level with (type=socket.SOCK_STREAM and type=socket.SOCK_DGRAM) to merge the results (which is already done for another reason):

def getaddrinfo(host, port, family=0, type=0, proto=0, flags=0):

Or, if you don't mind, we can just always do it where protocol AND socket type are unspecified, but port is specified and is numeric. This is very simple, the overhead (I think) is negligible, and it is consistent across platforms.

I would appreciate an opinion from Python developers and/or Solaris engineers before I put more work into this.

OpenIndiana

$ python3
Python 3.9.19 (main, Mar 26 2024, 20:30:24) [GCC 13.2.0] on sunos5
Type "help", "copyright", "credits" or "license" for more information.

>>> socket.getaddrinfo("127.0.0.1", 5000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/socket.py", line 954, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 9] service name not available for the specified socket type

>>> socket.getaddrinfo("127.0.0.1", 5000, family=socket.AF_INET)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/socket.py", line 954, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 9] service name not available for the specified socket type

>>> socket.getaddrinfo("127.0.0.1", None)
[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 0))]

>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]

>>> socket.getaddrinfo("127.0.0.1", 5000, proto=socket.IPPROTO_TCP)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]

macOS

% python3
Python 3.12.5 (main, Aug  6 2024, 19:08:49) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000))]

AIX

-bash-5.1$ python3
Python 3.9.19 (main, Apr  5 2024, 05:08:51) 
[GCC 10.3.0] on aix
Type "help", "copyright", "credits" or "license" for more information.

>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000))]

>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000))]

Solaris

$ python3
Python 3.9.19 (main, Sep 23 2024, 16:24:38) 
[GCC 13.2.0] on sunos5
Type "help", "copyright", "credits" or "license" for more information.

>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 5000))]
>>> socket.getaddrinfo("127.0.0.1", 5000, family=socket.AF_INET)
[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 5000))]
>>> socket.getaddrinfo("127.0.0.1", None)
[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 0))]
>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]
>>> socket.getaddrinfo("127.0.0.1", 5000, proto=socket.IPPROTO_TCP)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]

FreeBSD 14.1

[zaytsev@freebsd ~]$ python3
Python 3.11.9 (main, Oct  5 2024, 11:59:33) [Clang 18.1.5 (https://github.com/llvm/llvm-project.git llvmorg-18.1.5-0-g617a15 on freebsd14
Type "help", "copyright", "credits" or "license" for more information.

>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_SEQPACKET: 5>, 132, '', ('127.0.0.1', 5000))]

>>> socket.getaddrinfo("127.0.0.1", None)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_SEQPACKET: 5>, 132, '', ('127.0.0.1', 0))]

Linux (Fedora)

zaytsev@fedora:~$ python3
Python 3.12.4 (main, Jun  7 2024, 00:00:00) [GCC 14.1.1 20240607 (Red Hat 14.1.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_RAW: 3>, 0, '', ('127.0.0.1', 5000))]

Summary

  • Linux: STREAM, DGRAM, RAW
  • macOS: STREAM, DGRAM
  • OpenBSD: STREAM, DGRAM
  • FreeBSD with Cheri: STREAM, DGRAM, SEQPACKET
  • AIX: DGRAM (but STREAM supported, obviously!)
  • Solaris/Illumos: exception

/cc @kulikjak

CPython versions tested on:

3.9

Operating systems tested on:

Other

Linked PRs

@zyv zyv added the type-bug An unexpected behavior, bug, or error label Sep 8, 2024
@zyv
Copy link
Author

zyv commented Sep 8, 2024

PoC

--- socket.py.orig      Tue Mar 26 20:30:27 2024
+++ socket.py   Sun Sep  8 07:37:32 2024
@@ -948,6 +948,11 @@
     narrow the list of addresses returned. Passing zero as a value for each of
     these arguments selects the full range of results.
     """
+    # Provide consistent behavior across UNIX platforms (see issue #123832)
+    if isinstance(port, int) and type == 0 and proto == 0:
+        return getaddrinfo(host, port, family, SOCK_DGRAM, proto, flags) + \
+            getaddrinfo(host, port, family, SOCK_STREAM, proto, flags)
+
     # We override this function since we want to translate the numeric family
     # and socket type values to enum constants.
     addrlist = []
$ python3
Python 3.9.19 (main, Mar 26 2024, 20:30:24)
[GCC 13.2.0] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 1>, 17, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]

@zyv
Copy link
Author

zyv commented Sep 8, 2024

I have just checked AIX 7.1 and AIX 7.3, and my speculation that they also exhibit inconsistent behavior is confirmed. In the case of AIX, if the socket type is not specified, it doesn't return all supported socket types, but only SOCK_DGRAM instead of the more commonly used SOCK_STREAM, even though it supports it:

-bash-5.1$ python3
Python 3.9.19 (main, Apr  5 2024, 05:08:51) 
[GCC 10.3.0] on aix
Type "help", "copyright", "credits" or "license" for more information.

>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000))]

>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000))]

I guess this supports my idea of just calling the function twice at the socket.py level to provide consistent behavior across UNIX platforms.

@zyv
Copy link
Author

zyv commented Sep 8, 2024

Just checked on Linux, and also returns RAW in addition to STREAM & DGRAM:

zaytsev@fedora:~$ python3
Python 3.12.4 (main, Jun  7 2024, 00:00:00) [GCC 14.1.1 20240607 (Red Hat 14.1.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_RAW: 3>, 0, '', ('127.0.0.1', 5000))]

So even the behavior of two major platforms (Linux and macOS) is inconsistent.

@zyv zyv changed the title socket.getaddrinfo on Solaris fails if service name (port) is specified, but socket type / protocol unspecified getaddrinfo fails on Solaris, inconsistent on Linux, macOS if port is, but type & protocol are not specified Sep 8, 2024
@picnixz picnixz added the stdlib Python modules in the Lib dir label Sep 8, 2024
@picnixz
Copy link
Contributor

picnixz commented Sep 8, 2024

So even the behavior of two major platforms (Linux and macOS) is inconsistent.

Can you check whether the behaviour is still inconsistent if you use the same Python version (AFAICT, one is 3.9 for Solaris and the other is 3.12)?

@zyv
Copy link
Author

zyv commented Sep 8, 2024

Can you check whether the behaviour is still inconsistent if you use the same Python version (AFAICT, one is 3.9 for Solaris and the other is 3.12)?

Yes, I can confirm that Python 3.9 behaves inconsistently on all platforms:

  • Linux: STREAM, DGRAM, RAW
  • macOS: STREAM, DGRAM
  • OpenBSD: STREAM, DGRAM
  • FreeBSD with Cheri: STREAM, DGRAM, SEQPACKET
  • AIX: DGRAM (but STREAM supported, obviously!)
  • Illumos: exception

I can easily confirm that this behavior persists in Python 3.12 on Linux and macOS, but it will take me a huge effort to build Python 3.12 on AIX and Solaris from scratch to confirm that it's the same.

However, I don't think it's absolutely necessary, because to my knowledge no relevant changes happened between 3.9 and 3.12, and for 3.12 the behavior remains inconsistent for at least 2 of the 4 platforms in question.

@picnixz
Copy link
Contributor

picnixz commented Sep 8, 2024

I don't know if it's inconsistent or not actually. The docstring tells me:

Passing zero as a value for each of these arguments selects the full range of results.

Is RAW supported on macOS in general? On the other hand, we need to dig into https://github.com/python/cpython/blob/main/Modules/getaddrinfo.c to see how (and possibly why) the RAW family is not found.

@picnixz picnixz added extension-modules C modules in the Modules dir and removed stdlib Python modules in the Lib dir labels Sep 8, 2024
@zyv
Copy link
Author

zyv commented Sep 8, 2024

I don't know if it's inconsistent or not actually. The docstring tells me:

Passing zero as a value for each of these arguments selects the full range of results.

Well, throwing an exception on Solaris and returning only SOCK_DGRAM when SOCK_STREAM is definitely supported on AIX doesn't sound like "the full range of results" to me.

If you define consistency as each platform having its own interpretation of what "the full range of results" actually is, then yes, it might be consistent :)

Is RAW supported on macOS in general?

I guess not, because when I explicitly request SOCK_RAW, it still only returns SOCK_STREAM and SOCK_DGRAM (although I would have expected an empty list):

% python3.9              
Python 3.9.20 (main, Sep  6 2024, 19:03:56) 
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_RAW)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 5000)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 5000))]

Perhaps another argument to iterate over SOCK_STREAM, SOCK_DGRAM and SOCK_RAW, making a set and returning a list.

@picnixz
Copy link
Contributor

picnixz commented Sep 8, 2024

If you define consistency as each platform having its own interpretation of what "the full range of results" actually is, then yes, it might be consistent :)

Yes that's what I had in mind. I think we rely on the libc implementation of getaddrinfo(3) so the results are platform-dependent and we cannot really do more I guess.

Well, throwing an exception on Solaris and returning only SOCK_DGRAM when SOCK_STREAM is definitely supported on AIX doesn't sound like "the full range of results" to me.

This in particular seems a bug if you can use SOCK_STREAM.

I guess not, because when I explicitly request SOCK_RAW, it still only returns SOCK_STREAM and SOCK_DGRAM (although I would have expected an empty list):

We'll need to investigate whether it's what is being returned by the OS or if we incorrectly fetch the results. I am not very familiar with this part of the code and cannot test it so someone more qualified will need to do it.

@zyv
Copy link
Author

zyv commented Sep 8, 2024

We'll need to investigate whether it's what is being returned by the OS or if we incorrectly fetch the results. I am not very familiar with this part of the code and cannot test it so someone more qualified will need to do it.

My understanding is that getaddrinfo(3) is a low-level interface, and C-level users never really need to omit both ai_socktype and ai_protocol at the same time. Usually, if you really don't care about the protocol, you at least set the socket type, otherwise how are you supposed to handle it when you want stream and get datagram instead?

The Python interface, however, has only two mandatory parameters (where the service name is actually not very mandatory, as you can pass None instead), and has the semantics that you have posted. Unfortunately, the low-level interface just doesn't match it. If you look at the CPython implementation of getaddrinfo wrapper, you will see tons of weird hacks like setting the service name to 00 for macOS, or the Tru64+ stuff. Obviously, the intent was to abstract these differences and provide platform-independent semantics to high-level interface users.

So if that's still the goal, I don't see a way around enumerating socket types to provide a consistent interface for high-level users. As you can see from the reaction of the Flask-SocketIO maintainer, he doesn't see a problem with doing getaddrinfo("127.0.0.1", 5000) without setting the protocol or socket type because the documentation says it's fine, and expects Python to abstract the low-level getaddrinfo differences from him.

The Gevent people were bitten by the same bug and ended up adding hacks to their code: gevent/gevent#1256. Eventlet is still broken on Solaris and AIX.

@kulikjak
Copy link
Contributor

Hi, I am sorry I didn't react earlier, I completely missed the notification.

I tested this on Oracle Solaris, and it differs from OpenIndiana...:

$ python3
Python 3.9.19 (main, Sep 23 2024, 16:24:38) 
[GCC 13.2.0] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import socket
>>> socket.getaddrinfo("127.0.0.1", 5000)
[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 5000))]
>>> socket.getaddrinfo("127.0.0.1", 5000, family=socket.AF_INET)
[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 5000))]
>>> socket.getaddrinfo("127.0.0.1", None)
[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 0))]
>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]
>>> socket.getaddrinfo("127.0.0.1", 5000, proto=socket.IPPROTO_TCP)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]

It behaves the same on 3.9.19, 3.11.9 and 3.13.0 (all runtimes we currently ship).

There is no exception, but it still looks different from other platforms...

@zyv zyv changed the title getaddrinfo fails on Solaris, inconsistent on Linux, macOS if port is, but type & protocol are not specified getaddrinfo fails on Illumos, inconsistent elsewhere if port is, but type & protocol are not specified Oct 23, 2024
@zyv
Copy link
Author

zyv commented Oct 23, 2024

Hi @kulikjak , thanks for your feedback!

In the meantime, I was able to get Solaris 11.4 in a virtual machine and cannot confirm your findings. For me it's the same as Illumos. Is my VM outdated and the problem was somehow fixed in the latest version that you are using, or how should I understand these results?

Oracle Solaris 11.4.42.111.0                  Assembled December 2021
root@solaris:~# python3
Python 3.7.10 (default, Nov 19 2021, 12:42:39) 
[GCC 10.3.0] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.getaddrinfo("127.0.0.1", 5000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 9] service name not available for the specified socket type
>>> socket.getaddrinfo("127.0.0.1", 5000, family=socket.AF_INET)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 9] service name not available for the specified socket type
>>> socket.getaddrinfo("127.0.0.1", None)
[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 0))]
>>> socket.getaddrinfo("127.0.0.1", 5000, type=socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]
>>> socket.getaddrinfo("127.0.0.1", 5000, proto=socket.IPPROTO_TCP)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 2>, 6, '', ('127.0.0.1', 5000))]

Anyways, sorry about transferring what I learned from OpenIndiana directly to Solaris. Sadly, since the Sun doesn't shine, Oracle doesn't make it easy for porters to get their hands on the "real" Solaris, so much work usually happens "by proxy" on Illumos / OpenIndiana. I've reported a bug against Illumos, let's see what they say.

https://www.illumos.org/issues/16851

Still, irrespectively of Solaris behavior, I think the issue for Python is very relevant: if the semantics of the standard library were meant to provide a useful and consistent behavior without specifying the socket type, then it seems that there is no way around adding a loop to achieve meaningful results.

@kulikjak
Copy link
Contributor

kulikjak commented Oct 24, 2024

In the meantime, I was able to get Solaris 11.4 in a virtual machine and cannot confirm your findings. For me it's the same as Illumos.

Oh, well that's interesting. I guess something changed since then? I will investigate.

Anyways, sorry about transferring what I learned from OpenIndiana directly to Solaris. Sadly, since the Sun doesn't shine, Oracle doesn't make it easy for porters to get their hands on the "real" Solaris, so much work usually happens "by proxy" on Illumos / OpenIndiana.

I know, it's nowhere near ideal...

Looking at the version you tested it with, I presume you downloaded (now quite old) the first CBE release we did almost two years ago. I cannot promise anything, but I can share that we would really like to release another one pretty soon, and then hopefully keep doing so more often.

Still, irrespectively of Solaris behavior, I think the issue for Python is very relevant: if the semantics of the standard library were meant to provide a useful and consistent behavior without specifying the socket type, then it seems that there is no way around adding a loop to achieve meaningful results.

+1

@kulikjak
Copy link
Contributor

So, I found out that the behavior was indeed changed in SRU63 (released last November); before that, it behaved as OpenIndiana.

Apparently, we also encountered the same exceptions but decided that even though it's within the standard, we should adjust the behavior to mirror other systems more closely.

@zyv
Copy link
Author

zyv commented Oct 25, 2024

So, I found out that the behavior was indeed changed in SRU63 (released last November); before that, it behaved as OpenIndiana.

That explains it! It seems out that Illumos people found the same issue independently 9 months ago and see it as a bug, but didn't fix it so far:

https://www.illumos.org/issues/16273

Apparently, we also encountered the same exceptions but decided that even though it's within the standard, we should adjust the behavior to mirror other systems more closely.

Well, the problem of not being able to run Python-based web servers of Solaris out of the box is a practical one, so this decision makes sense.

I know, it's nowhere near ideal...

Yes, I took the VM by https://github.com/vmactions - everything else was way too difficult. Thanks for your activities, hopefully they will help other people in the future.

@encukou
Copy link
Member

encukou commented Oct 25, 2024

I'm not the expert (that would be @gpshead), so It's likely that I missed something, but it does look like an OS bug. The standard I could get my hands on, POSIX.1-2017, sounds unambiguous:

If the ai_family field to which hints points has the value AF_UNSPEC, addresses shall be returned for use with any address family that can be used with the specified nodename and/or servname.

(Python maps host/port to nodename/servname, and unspecified family to AF_UNSPEC)


Our official policy for unsupported platforms is ambiguous, but it looks like the workaround here is too big to accept, let alone backport to a bugfix branch. We won't test it in CI, after all.
But if you build for such a platform, I'd encourage you to patch this.

If you'd like to contribute a test, to codify what the intended behaviour is and possibly alert other platform maintainers about the issue, that'd be very nice.

Does that sound fair?

@zyv
Copy link
Author

zyv commented Oct 26, 2024

If the ai_family field to which hints points has the value AF_UNSPEC, addresses shall be returned for use with any address family that can be used with the specified nodename and/or servname. [..] A value of zero for ai_protocol means that the caller shall accept any protocol.

(Python maps host/port to nodename/servname, and unspecified family to AF_UNSPEC)

@encukou, just to confirm, your reading of POSIX is that getaddrinfo is actually supposed to return any possibly usable socket configurations, and not necessarily all?

  1. If yes (any), then all implementations (including AIX) other than Solaris / Illumos conform, and "mainstream" (Linux, macOS, BSD) go beyond by providing all supported options.

    • In this case, I would suggest to submit a documentation patch to Python to clarify that omitting keyword arguments returns any, and not necessarily all usable results.
  2. If no (all), then anything other than "mainstream" only partially conforms if at all, i.e. AIX returns just one type, even though more are usable, and Solaris/Illumos returns unusable results when it doesn't crash.

    • In this case, indeed, if you are not willing to accept a workaround in form of an outer loop, a test would be a better solution.

Apparently, we also encountered the same exceptions but decided that even though it's within the standard, we should adjust the behavior to mirror other systems more closely.

@kulikjak, since your team analyzed the standard when making the change for SRU63, can you say something as to why you are still not returning ai_socktype, unlike AIX and everybody else?

My reading of the standard is...

The fields ai_family, ai_socktype, and ai_protocol shall be usable as the arguments to the socket() function to create a socket suitable for use with the returned address.

... that ai_socktype must be specified.

@encukou
Copy link
Member

encukou commented Oct 28, 2024

Ah, I see the ambigous bit in the standard now. Thanks!

What about a docs update like this one?

It includes wording similar to the POSIX's providing options and by limiting the returned information, which IMO suggests that the hints indeed limit the resulting list compared to the defaults, but can be interpreted differently.

It specifically says that it wraps the underlying C function, so, the details are in OS docs. The “full range of results” bit goes away.

It uses AF_UNSPEC rather than zero for the default, although I don't think a system where it's nonzero would be very usable.

It suggests setting proto to SOCK_STREAM or SOCK_DGRAM -- you probably want that on all systems, anyway.

It suggest that the results should be tried in order, which is, AFAIK best practice -- see RFC 6724 section 2 and its predecessor from 2003 (which are specific to IP, but indicate how people use this):

Well-behaved applications SHOULD iterate through the list of addresses returned from getaddrinfo() until they find a working address.

Does this look reasonable? (If so I'll open a PR, hold any nitpicks/typos until then.)

--- a/Doc/library/socket.rst
+++ b/Doc/library/socket.rst
@@ -928,7 +928,7 @@ The :mod:`socket` module also offers various network-related services:
 
    .. versionadded:: 3.7
 
-.. function:: getaddrinfo(host, port, family=0, type=0, proto=0, flags=0)
+.. function:: getaddrinfo(host, port, family=AF_UNSPEC, type=0, proto=0, flags=0)
 
    Translate the *host*/*port* argument into a sequence of 5-tuples that contain
    all the necessary arguments for creating a socket connected to that service.
@@ -938,8 +938,9 @@ The :mod:`socket` module also offers various network-related services:
    and *port*, you can pass ``NULL`` to the underlying C API.
 
    The *family*, *type* and *proto* arguments can be optionally specified
-   in order to narrow the list of addresses returned.  Passing zero as a
-   value for each of these arguments selects the full range of results.
+   in order to provide options and limit the list of addresses returned.
+   Pass :data:`AF_UNSPEC` as a value for *family*, and/or zero as a value for
+   *type* or *proto*, to not limit the results.
    The *flags* argument can be one or several of the ``AI_*`` constants,
    and will influence how results are computed and returned.
    For example, :const:`AI_NUMERICHOST` will disable domain name resolution
@@ -959,6 +960,20 @@ The :mod:`socket` module also offers various network-related services:
    :const:`AF_INET6`), and is meant to be passed to the :meth:`socket.connect`
    method.
 
+   .. note::
+
+      This function wraps the C function ``getaddrinfo`` of the underlying
+      system.
+
+      With default values of *family*, *type*, *proto* and/or *flags*,
+      many systems will return a sorted list of all matching addresses,
+      which should generally be tried in order until a connection succeeds
+      (possibly in parallel, for example using a `Happy Eyeballs`_ algorithm).
+
+      Some systems will, however, only return a single address.
+      To ensure this address is a usable one, limit the options: for example,
+      set *type* to :const:`SOCK_STREAM` or :const:`SOCK_DGRAM`.
+
    .. audit-event:: socket.getaddrinfo host,port,family,type,protocol socket.getaddrinfo
 
    The following example fetches address information for a hypothetical TCP
@@ -978,6 +993,8 @@ The :mod:`socket` module also offers various network-related services:
       for IPv6 multicast addresses, string representing an address will not
       contain ``%scope_id`` part.
 
+.. _Happy Eyeballs: https://en.wikipedia.org/wiki/Happy_Eyeballs
+
 .. function:: getfqdn([name])
 
    Return a fully qualified domain name for *name*. If *name* is omitted or empty,

@zyv
Copy link
Author

zyv commented Oct 29, 2024

It includes wording similar to the POSIX's providing options and by limiting the returned information, which IMO suggests that the hints indeed limit the resulting list compared to the defaults, but can be interpreted differently.

It specifically says that it wraps the underlying C function, so, the details are in OS docs. The “full range of results” bit goes away.

It uses AF_UNSPEC rather than zero for the default, although I don't think a system where it's nonzero would be very usable.

It suggests setting proto to SOCK_STREAM or SOCK_DGRAM -- you probably want that on all systems, anyway.

It suggest that the results should be tried in order, [...]

I like all these points. I think it really helps (at least it would have helped me before I opened this issue).

I'm not entirely happy with the changes to the introduction & this part though:

      Some systems will, however, only return a single address.
      To ensure this address is a usable one, limit the options: for example,
      set *type* to :const:`SOCK_STREAM` or :const:`SOCK_DGRAM`.

My advice would be to always specify socket type and/or protocol, unless you actually don't plan to use the information returned, but rather check resolution or something. The reason for this is that, as we have seen, mainstream systems (and AIX) return many results, where the first one may not actually be a usable one, and Solaris / Illumos will return something if they don't crash, but you can't use that to open a socket.

I've linked a couple of libraries in this issue, and what they mostly do is open a TCP socket, but the documentation leads the authors to believe that omitting all parameters and picking the first entry is the right thing to do.

We can argue about whether it's the job of Python documentation to teach users about networking, but if it's no big deal, I'd try to write it in a way that spreads the sacred knowledge and reinforces best practices.

To be specific, I would suggest something like this:

The family, type and proto arguments can be optionally specified in order to provide options and limit the list of addresses returned. Consider limiting the results by type (e.g. :data:SOCK_STREAM or :data:SOCK_DGRAM) and/or proto (e.g. :data:IPPROTO_TCP or :data:IPPROTO_UDP) instead of using the defaults if you intend to use the results to create a socket (see note). Pass :data:AF_UNSPEC as a value for family, and/or zero as a value for type or proto, to not limit the results.

encukou added a commit to encukou/cpython that referenced this issue Oct 30, 2024
…mpliance

This changes nothing changes for CPython supported platforms,
but hints how to deal with platforms that stick to the letter of
the spec.
It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`;
specifically, workarounds to make the function work consistently across
platforms are out of scope in its code.

Include wording similar to the POSIX's “by providing options and by
limiting the returned information”, which IMO suggests that the
hints limit the resulting list compared to the defaults, *but* can
be interpreted differently. Details are added in a note.

Specifically say that this wraps the underlying C function. So, the
details are in OS docs. The “full range of results” bit goes away.

Use `AF_UNSPEC` rather than zero for the *family* default, although
I don't think a system where it's nonzero would be very usable.

Suggest setting proto and/or type (with examples, as the appropriate
values aren't obvious). Say why you probably want to do that that
on all systems; mention the behavior on the “letter of the spec”
systems.

Suggest that the results should be tried in order, which is,
AFAIK best practice -- see RFC 6724 section 2, and its predecessor
from 2003 (which are specific to IP, but indicate how people use this):

> Well-behaved applications SHOULD iterate through the list of
> addresses returned from `getaddrinfo()` until they find a working address.
@encukou
Copy link
Member

encukou commented Oct 30, 2024

Thanks!
The PR is at #126182, with yet more rewordings.

encukou added a commit that referenced this issue Nov 14, 2024
…ce (GH-126182)

* gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance

This changes nothing changes for CPython supported platforms,
but hints how to deal with platforms that stick to the letter of
the spec.
It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`;
specifically, workarounds to make the function work consistently across
platforms are out of scope in its code.

Include wording similar to the POSIX's “by providing options and by
limiting the returned information”, which IMO suggests that the
hints limit the resulting list compared to the defaults, *but* can
be interpreted differently. Details are added in a note.

Specifically say that this wraps the underlying C function. So, the
details are in OS docs. The “full range of results” bit goes away.

Use `AF_UNSPEC` rather than zero for the *family* default, although
I don't think a system where it's nonzero would be very usable.

Suggest setting proto and/or type (with examples, as the appropriate
values aren't obvious). Say why you probably want to do that that
on all systems; mention the behavior on the “letter of the spec”
systems.

Suggest that the results should be tried in order, which is,
AFAIK best practice -- see RFC 6724 section 2, and its predecessor
from 2003 (which are specific to IP, but indicate how people use this):

> Well-behaved applications SHOULD iterate through the list of
> addresses returned from `getaddrinfo()` until they find a working address.


Co-authored-by: Carol Willing <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 14, 2024
…mpliance (pythonGH-126182)

* pythongh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance

This changes nothing changes for CPython supported platforms,
but hints how to deal with platforms that stick to the letter of
the spec.
It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`;
specifically, workarounds to make the function work consistently across
platforms are out of scope in its code.

Include wording similar to the POSIX's “by providing options and by
limiting the returned information”, which IMO suggests that the
hints limit the resulting list compared to the defaults, *but* can
be interpreted differently. Details are added in a note.

Specifically say that this wraps the underlying C function. So, the
details are in OS docs. The “full range of results” bit goes away.

Use `AF_UNSPEC` rather than zero for the *family* default, although
I don't think a system where it's nonzero would be very usable.

Suggest setting proto and/or type (with examples, as the appropriate
values aren't obvious). Say why you probably want to do that that
on all systems; mention the behavior on the “letter of the spec”
systems.

Suggest that the results should be tried in order, which is,
AFAIK best practice -- see RFC 6724 section 2, and its predecessor
from 2003 (which are specific to IP, but indicate how people use this):

> Well-behaved applications SHOULD iterate through the list of
> addresses returned from `getaddrinfo()` until they find a working address.

(cherry picked from commit ff0ef0a)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Carol Willing <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 14, 2024
…mpliance (pythonGH-126182)

* pythongh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance

This changes nothing changes for CPython supported platforms,
but hints how to deal with platforms that stick to the letter of
the spec.
It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`;
specifically, workarounds to make the function work consistently across
platforms are out of scope in its code.

Include wording similar to the POSIX's “by providing options and by
limiting the returned information”, which IMO suggests that the
hints limit the resulting list compared to the defaults, *but* can
be interpreted differently. Details are added in a note.

Specifically say that this wraps the underlying C function. So, the
details are in OS docs. The “full range of results” bit goes away.

Use `AF_UNSPEC` rather than zero for the *family* default, although
I don't think a system where it's nonzero would be very usable.

Suggest setting proto and/or type (with examples, as the appropriate
values aren't obvious). Say why you probably want to do that that
on all systems; mention the behavior on the “letter of the spec”
systems.

Suggest that the results should be tried in order, which is,
AFAIK best practice -- see RFC 6724 section 2, and its predecessor
from 2003 (which are specific to IP, but indicate how people use this):

> Well-behaved applications SHOULD iterate through the list of
> addresses returned from `getaddrinfo()` until they find a working address.

(cherry picked from commit ff0ef0a)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Carol Willing <[email protected]>
encukou added a commit that referenced this issue Nov 15, 2024
…ompliance (GH-126182) (GH-126825)

gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance (GH-126182)

* gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance

This changes nothing changes for CPython supported platforms,
but hints how to deal with platforms that stick to the letter of
the spec.
It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`;
specifically, workarounds to make the function work consistently across
platforms are out of scope in its code.

Include wording similar to the POSIX's “by providing options and by
limiting the returned information”, which IMO suggests that the
hints limit the resulting list compared to the defaults, *but* can
be interpreted differently. Details are added in a note.

Specifically say that this wraps the underlying C function. So, the
details are in OS docs. The “full range of results” bit goes away.

Use `AF_UNSPEC` rather than zero for the *family* default, although
I don't think a system where it's nonzero would be very usable.

Suggest setting proto and/or type (with examples, as the appropriate
values aren't obvious). Say why you probably want to do that that
on all systems; mention the behavior on the “letter of the spec”
systems.

Suggest that the results should be tried in order, which is,
AFAIK best practice -- see RFC 6724 section 2, and its predecessor
from 2003 (which are specific to IP, but indicate how people use this):

> Well-behaved applications SHOULD iterate through the list of
> addresses returned from `getaddrinfo()` until they find a working address.

(cherry picked from commit ff0ef0a)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Carol Willing <[email protected]>
encukou added a commit that referenced this issue Nov 15, 2024
…ompliance (GH-126182) (GH-126824)

gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance (GH-126182)

* gh-123832: Adjust `socket.getaddrinfo` docs for better POSIX compliance

This changes nothing changes for CPython supported platforms,
but hints how to deal with platforms that stick to the letter of
the spec.
It also marks `socket.getaddrinfo` as a wrapper around `getaddrinfo(3)`;
specifically, workarounds to make the function work consistently across
platforms are out of scope in its code.

Include wording similar to the POSIX's “by providing options and by
limiting the returned information”, which IMO suggests that the
hints limit the resulting list compared to the defaults, *but* can
be interpreted differently. Details are added in a note.

Specifically say that this wraps the underlying C function. So, the
details are in OS docs. The “full range of results” bit goes away.

Use `AF_UNSPEC` rather than zero for the *family* default, although
I don't think a system where it's nonzero would be very usable.

Suggest setting proto and/or type (with examples, as the appropriate
values aren't obvious). Say why you probably want to do that that
on all systems; mention the behavior on the “letter of the spec”
systems.

Suggest that the results should be tried in order, which is,
AFAIK best practice -- see RFC 6724 section 2, and its predecessor
from 2003 (which are specific to IP, but indicate how people use this):

> Well-behaved applications SHOULD iterate through the list of
> addresses returned from `getaddrinfo()` until they find a working address.

(cherry picked from commit ff0ef0a)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Carol Willing <[email protected]>
@encukou encukou added docs Documentation in the Doc dir and removed type-bug An unexpected behavior, bug, or error labels Nov 15, 2024
@encukou encukou closed this as completed Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir extension-modules C modules in the Modules dir
Projects
Status: Todo
Development

No branches or pull requests

4 participants