-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elf: replace .got.zig with a zig jump table #21065
Conversation
Couldn't the function pointer point to the address in the jump table? Then the pointer would be also correct after that jump table address has been updated. |
I still think a jump table is better:
|
That one, I have no experience or intuition with so will hand over to @jacobly0 instead.
We increase fragmentation, that is true, but we still leave
How is function pointer alignment problem solved with a jump table?
Actually bookkeeping is largely unchanged whether the jump table is as one big block or distributed. If a function has a trampoline, it gets an
Incremental linking does not become more complex because of trampolines since we simply create and add a new atom+symbol in place of the old one with reduced size of X where X is trampoline size. All happens (will happen) using the same algorithm as for allocating and freeing atoms we already use so I don't see any added complexity beyond actually creating a new atom/symbol for the said trampoline. If I am not seeing something obvious tho, please do let me know. Hot code swapping will indeed become more complicated because trampolines I believe for the reasons mentioned by @mlugg tl;dr I would like us to reach a consensus how to proceed so that I don't have revert the changes immediately after committing them. Also, if you feel we should go back to an offset table, this is also fine fwiw. I just think that ease of maintaining and developing codegen backends is of higher priority than linker comlexity IMHO, or put it another way, I want to make codegen backends completely separated from the concept of incremental linking. |
Function pointers are always pointer-size aligned when using a jump table. |
I guess I should have asked this first: do we indirect pointers to functions via a jump table too? If so, this will not currently succeed: test "align(N) on functions" {
try expect((@intFromPtr(&overaligned_fn) & (0x1000 - 1)) == 0);
}
fn overaligned_fn() align(0x1000) i32 {
return 42;
} |
Also please note that every entry in a jump table is not pointer-size aligned, but instruction aligned. Perhaps you are referring to an offset table? |
This is easy to understand: Here is a cache line
Here is a cache line full of jump table data:
All those F's are valid pointers to functions that might be used. Here is a cache line after a function has been relocated with the other strategy:
J - jump instruction to the real function |
Ok so it seems you mean an offset table being a better solution than a jump table be it in one big block or distributed. |
This means that with the function header trampoline strategy, you have to keep around an old symbol data entry around for the now-deleted function, whereas with the jump table strategy, you don't. |
No, my mistake for saying "function pointer" when I should have said "jump instruction" but otherwise the same point stands. It looks like on x86_64, jumps with absolute address is 5 bytes which is annoying, but it's fine. Function pointers will have alignment of 1 on that platform then. This test case will regress: test "align(N) on functions" {
try expect((@intFromPtr(&overaligned_fn) & (0x1000 - 1)) == 0);
}
fn overaligned_fn() align(0x1000) i32 {
return 42;
} The language will be modified to say when you take the address of a function, it does not necessarily gain the machine code alignment specified with the Even if we go with the function prologue strategy resolution, I will still make this language change, because it is already evident this flexibility is useful for compilers. |
I am getting confused, let's agree on what is what if that's OK. We have 3 options under consideration:
FWIW I think there is a way to keep option 1, or at least reconsider it, while ensuring codegen backends are largely incremental linking agnostic by utilising the idea of lazy-symbol binding - we emit a section with immutable jump entries that always point to the same pointer in the offset table. Offset table is again mutable where we are free to rewrite pointers, however since the machine code now points at an immutable jump table we keep the codegen largely unaware of incremental linking. It would look something like this:
|
FWIW the obvious con of option 4* is having to (re-)introduce two more sections |
I think we should toss out option 1 purely on a performance basis:
The discussion right now is jump table vs distributed jump table. I don't think anyone is advocating for offset table. I'm not sure why you are bringing up Option 4? In my earlier comments I used these terms:
|
Oh I brought up option 4 to point out yet another mechanism at our disposal in case it wasn't clear how an offset table can be utilised while making codegen agnostic to the concept of indirection. |
If you pack all of the jumps together, you enforce that every function call requires a minimum of 2 cache references, 3 if the jump instruction itself crosses a cache boundary (since 5 is not a power of 2, and pretty terrible since it would happen on a whopping 6% of functions). In order to fully utilize the first cache reference, you have to get lucky with using all the adjacent jump entries at around the same time, which is not even slightly reasonable to expect. If you prefix the majority of non-changing functions with their own jump, then you allow them to only require a single cache reference that is fully utilized for all but the most trivial of functions. Saying that this space goes to waste after moving the function is misleading because it could be filled with any other function that fits in that space, in the same way that unused jump entries can be filled with any other jump. Even in the split case, you only have to get lucky with two functions being in use at the same time to fully utilize any given cache. Increasing the minimum cost of all functions is clearly inferior to only increasing the cost of some changing functions (those rapidly increasing in size). This is even more true when you realize that once a split function stops changing over a period of time, it can be reunited in a "garbage collection"-like manner, restoring the original performance after you are happy with the new function implementation and stop editing it (in a way that vastly increases size). This trades the cost of updating all references every time a function increases in size, to only doing it once the function has stabilized and stops increasing in size (and can be deferred to batch over many functions when there is time to waste). This is never possible with a jump table because those jumps can never be adjacent to their implementation. The bookkeeping seems completely equivalent, for each symbol you either track where the jump table entry is or which possibly not yet named symbol contains the actual implementation (something which can also be trivially recovered by just reading the jump instruction). I'm also not sure why we are intent on removing function pointer alignment from the language, since once a function is aligned to a cache line, there is little benefit to aligning it any more other than gaining bits in the function pointer to be used for other things. It's the alignment of the first jump that matters in the common non-split jump prologue case, aligning the "beginning" of the function would just move it to a different cache line negating any benefits. |
Thanks for chiming in @jacobly0. I'm convinced by your performance-related arguments.
I don't understand what you're saying here. It seems like you're making an argument against function pointer alignment being useful, which seems to comport with removing the alignment guarantees of function pointers. But your conclusion is that we should keep function pointer alignment guarantees? |
I'm saying that function alignment doesn't have much use at all without function pointer alignment, and so I don't understand the stance that only one should be removed. I'd also argue that its usage is niche enough that theorizing about stubs is not very relevant given that stubs can also be aligned and almost no functions have an explicit alignment. I think Zig already made the correct choice by making functions pointers default to |
I see, so, would you be for or against completely removing align(N) syntax from function declarations then? |
I don't have a strong opinion on the outcome, I just have a strong opinion against using the discussion in this thread to justify removal. If other arguments are made for its removal, I could probably be easily convinced, but as of today, it still seems like an experiment worth keeping. |
Understood, thank you for the clarifications. I'm satisfied with the distributed jump table solution then. |
Co-authored-by: Jacob Young <[email protected]>
@@ -2230,7 +2171,7 @@ const riscv = struct { | |||
const riscv_util = @import("../riscv.zig"); | |||
}; | |||
|
|||
const ResolveArgs = struct { i64, i64, i64, i64, i64, i64, i64, i64 }; | |||
const ResolveArgs = struct { i64, i64, i64, i64, i64, i64, i64 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a tuple instead of a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was like this before but I ought to make it a struct instead of a tuple. Will do that in a follow-up.
Motivating factor: make this feature as transparent to the codegen as possible
Closes #20887
Previously, we would use
.got.zig
to indirect pointers to global data too but as was agreed on many an occasion we only really want to indirect function calls via an offset table or similar. In fact, as far as I understand that was the original plan of @andrewrk when he wrote the first PoC of incremental ELF linker. Therefore, since we longer want to indirect pointers to global data, it makes sense to replace an offset table with an equivalent jump (trampoline) table that is directly embedded within the machine code section. This improves code locality but also should make load times shorter if dynamically linking since we no longer have to rebase any pointers.The new jump table looks as follows (for x86_64):
Compared to storing pointers, if we have to relocate the table because it outgrew its capacity, we will have to re-calculate the jump targets since the jump sequence is PC-relative, however, this can be reduced into applying a fixed offset to every entry when relocating.
From the perspective of the codegen, when emitting a call with a relocation the codegen no longers needs to care about the presence or absence of the jump table/offset table - it simply emits
call rel32
withR_X86_64_PLT32
relocation where the target issymbol_a
. Then, the linker upon resolvingR_X86_64_PLT32
will check if the jump table has been created and the symbol can be indirected via said table and rewrite the target address to the jump table entry if so. Again, this is all transparent to the codegen. As an added bonus, codegen now generates identical code inbuild-exe
andbuild-obj
modes.One caveat of the new approach is that we only indirect function calls - if you request a function pointer, currently you will receive exactly that, with no indirection.
I am looking forward to the feedback if we should proceed or whatnot!
TODO
- [ ] riscv-elf trampolines(deferred until we have a working incremental linker)