-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tool conventions] Should a function pointer be 64-bits? #2
Comments
The current LLVM implementation uses 64-bit function pointers (because having different size pointers for different things, while possible in LLVM, is going to trip up all sorts of C/C++ code that assumes all pointers are the same size). These values then get truncated just before a It would of course be nice not to have that truncate op, but personally I would estimate In an ideal world, it would be nice for |
I don't have a strong opinion about this; big picture, 32-bit function pointers would be more work for small gains. This is a different situation from For completeness, here's what a case for making function pointers on wasm64 32-bit might look like: It would make in-memory vtables and similar data structures more compact, and would follow the spirit of C of having the builtin types reflect what's natural for the target machine. Wasm's "function pointers" really are in a different "address space". And it appears LLVM does have support for pointers of different sizes; see the |
@sunfishcode I am aware it is possible to make them 32-bit in LLVM, I am just not following how this is not going to break a ton of C/C++ code (code that is maybe not legal, but never the less). While by-value conversion to and from I'm aware such code would likely not be strict aliasing safe either, but that probably holds for a lot of existing C/C++ code out there. This could happen in some legacy library that someone is trying to port, and would be super hard to debug, and only happen when running under Wasm. These bugs would likely be entirely new to that code, since its never been built before for a platform that has different size pointers. I wouldn't think such issues are worth the savings for e.g. vtables. |
does wasm64 want to retain POSIX compatibility? |
I don't think there's one single answer to that question. POSIX covers a lot of things (most of them not relevant to this particular issue) and in general we try to be compatible where it makes sense, but don't make huge compromises to get it. Most of those tradeoffs happen in e.g. emscripten. For this particular issue, I actually don't think (AFAIK anyway) POSIX says anything about the size of function pointers. Different ABIs on the same architecture can already have different C type sizes and layouts, but those ABIs are defined by documents such as the SYSV ABI (Linux) or Win32 ABI (Windows) on x86, or the ARM AAPCS spec. They don't necessarily match across architectures or operating systems. |
we know there are a few places where POSIX actively assumes the memory and function pointer sizes are compatible; dlsym being one of the more obvious. |
Ah, I see what you mean. I think the general answer to your question is still less about conforming to POSIX per se, and more about balancing trying to work with as much existing code as possible, while having the best possible quality by other metrics, and dealing with the constraints of the platform. Emscripten's dynamic loading code is actually a pretty good example of this. We try to implement dlopen and dlsym, but the web environment is quite unlike the POSIX-y environments wrt code loading, since there may not be a filesystem (so we support URLs instead) and loading code is asynchronous on the main thread (so correct dlopen semantics are not always even possible). wrt your point here, I think you are getting at a similar issue as @aardappel's last comment. In the case of dlsym I think casting from void* to a function pointer and back should work ok as long as the underlying value is correct (i.e. within the table bounds) in the first place. It will be a size-changing cast (which the author likely did not expect) and there could be code that breaks when void* is not the same size as function pointer, as @aardappel also pointed out. So here we are balancing the likelihood of breaking some code against the benefits such as the ones @sunfishcode mentioned (e.g. saving memory in structures such as vtables). |
Personally, we know we treat them as having the same size when dealing with hexchat plugins, but we would never expect hexchat to run in a wasm environment anyway. Further, we wonder if it'd be feasible to have a native vtable format, instead of having to load vtables into wasm memory? That would be very useful. You could transparently compress those. |
@SoniEx2 Function pointers are implemented in LLVM as 64-bit for wasm64, so unless someone intends to reverse that design/implementation, you have the compatibility you're looking for. |
If you mean writing the value into a memory location like a function-pointer struct member, then you'd first have to cast the void* to a function pointer (reducing its size) before writing it to memory. Of course in C it's possible do this in some other type-unsafe ways (which ends up being undefined behavior), which wouldn't work. Surely there would exist programs that don't do it right, and these would break. Technically if there is UB it's the program's problem and not the compiler's, but of course all else being equal it's still better not to break such programs than to break them.
With the GC proposal you could build vtables out of GC structs or arrays. In the past we even kicked around the idea of trying to make clang lay out vtables as slices of the table rather than in memory (thus allowing method calls to use fewer loads). I think that might be technically possible but it would be a substantial change to clang, and there may be other showstoppers we haven't thought of. |
This is an example from an older version of POSIX: https://pubs.opengroup.org/onlinepubs/009604499/functions/dlsym.html The example from the current version of POSIX would work correctly, however: https://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html The only other discussion we can find related to this is in the gcc mailing list: https://www.mail-archive.com/[email protected]/msg93399.html |
Now that #46 is resolved and we are getting 64-tables it seems like there is no good reason to pursue this anymore. |
The tools will likely consider this feature to be "wasm64", but it doesn't yet have 64-bit function table indexes. Since C/C++ function pointers are lowered to function table entries, this means that they are 32-bit pointers. Should they be? Or should the tools use 64-bit "pointers" where the top 32-bits are empty?
cc @sunfishcode @jgravelle-google
The text was updated successfully, but these errors were encountered: