Skip to content
This repository has been archived by the owner on Apr 11, 2024. It is now read-only.

add R_LARCH_JIRL_LO12 #69

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add R_LARCH_JIRL_LO12 #69

wants to merge 1 commit into from

Conversation

xry111
Copy link
Contributor

@xry111 xry111 commented Sep 2, 2022

With "medium" code model, we call a function with a pair of PCALAU12I
and JIRL instructions. Currently the assembler produces something like:

   8:	1a00000c 	pcalau12i   	$t0, 0
			8: R_LARCH_PCALA_HI20	g
   c:	4c000181 	jirl        	$ra, $t0, 0
			c: R_LARCH_PCALA_LO12	g

This is problematic:

(1) If we just apply the R_LARCH_PCALA_LO12 relocation following its
definition, we'll got wrong result: jirl is 2RI16-type but
R_LARCH_PCALA_LO12 is for 2RI12-type.

(2) The linker need to produce a PLT entry if g is an external
function. Currently ld.bfd assumes if there is an R_LARCH_PCALA_HI20
against an external symbol, g "should be" an external function. But
this assumption is only true if the programmers (and the compiler) do
not make any error. Consider if a programmer moved data (which is not
a function) from one shared object into another, but forgot to update

la.local $t0, data
ld.d     $t0, $t0, 0

in the code. Now the linker generates a "PLT entry" for data without
any diagnostic, and ld.d instruction loads two instructions in the
PLT. The programmer can only notice something wrong (as a mysterious
crash or bad output) at runtime.

To avoid those issues, we may add extra semantics to R_LARCH_PCALA_LO12
like "if it's applied to a jirl instruction, blah blah..." but it's too
tricky and introduces extra complexity into the linker: ld.bfd is
designed to process the linking in multiple passes, and if we load the
actual content of text segment to inspect if there is a jirl
instruction, the memory footprint of linker will be increased
significantly. So it's better to use a new relocation type for function
call in medium code model.

With "medium" code model, we call a function with a pair of PCALAU12I
and JIRL instructions.  Currently the assembler produces something like:

   8:	1a00000c 	pcalau12i   	$t0, 0
			8: R_LARCH_PCALA_HI20	g
   c:	4c000181 	jirl        	$ra, $t0, 0
			c: R_LARCH_PCALA_LO12	g

This is problematic:

(1) If we just apply the R_LARCH_PCALA_LO12 relocation following its
definition, we'll got wrong result: jirl is 2RI16-type but
R_LARCH_PCALA_LO12 is for 2RI12-type.

(2) The linker need to produce a PLT entry if `g` is an external
function.  Currently ld.bfd *assumes* if there is an R_LARCH_PCALA_HI20
against an external symbol, `g` "should be" an external function.  But
this assumption is only true if the programmers (and the compiler) do
not make any error.  Consider if a programmer moved `data` (which is not
a function) from one shared object into another, but forgot to update

    la.local $t0, data
    ld.d     $t0, $t0, 0

in the code.  Now the linker generates a "PLT entry" for data without
any diagnostic, and `ld.d` instruction loads two instructions in the
PLT.  The programmer can only notice something wrong (as a mysterious
crash or bad output) at runtime.

To avoid those issues, we may add extra semantics to R_LARCH_PCALA_LO12
like "if it's applied to a jirl instruction, blah blah..."  but it's too
tricky and introduces extra complexity into the linker: ld.bfd is
designed to process the linking in multiple passes, and if we load the
actual content of text segment to inspect if there is a jirl
instruction, the memory footprint of linker will be increased
significantly.  So it's better to use a new relocation type for function
call in medium code model.
@xen0n
Copy link
Contributor

xen0n commented Jan 22, 2023

For the record: this is causing significant pain for the LLD port, where relocation semantics (the RelExpr enum) has to be determined early and ideally without depending on other relocs/input content. It is somewhat easy to treat R_LARCH_PCALA_LO12 on jirl differently, but it's much more difficult to differentiate between R_LARCH_PCALA_HI20's that produce intermediate result for a jirl and those not, because we're unlike RISC-V where the R_RISCV_PCREL_LO12's actually point to the corresponding HI20 reloc so correspondence is preserved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants