Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setinlined not working in JIT mode on LLVM >= 17 #671

Open
Tracked by #632
norcalli opened this issue Jul 20, 2024 · 5 comments
Open
Tracked by #632

setinlined not working in JIT mode on LLVM >= 17 #671

norcalli opened this issue Jul 20, 2024 · 5 comments

Comments

@norcalli
Copy link

terra-Linux-x86_64-094c5ad (1.1.1)

❯ ~/works/3rd/terra-Linux-x86_64-094c5ad/bin/terra ./tests/ainline.t
definition      {} -> int32
define dso_local i32 @"$bar"() {
entry:
  %puts.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str, i64 0, i64 0))
  %puts1.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.1, i64 0, i64 0))
  %puts2.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.2, i64 0, i64 0))
  %puts3.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.3, i64 0, i64 0))
  %puts4.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.4, i64 0, i64 0))
  %puts5.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.5, i64 0, i64 0))
  %puts6.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.6, i64 0, i64 0))
  %puts7.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.7, i64 0, i64 0))
  %puts8.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.8, i64 0, i64 0))
  %puts9.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.9, i64 0, i64 0))
  ret i32 4
}
assembly for function at address 0x72665b91d000
0x72665b91d000(+0):             push    rbx
0x72665b91d001(+1):             movabs  rdi, 125783948509184
0x72665b91d00b(+11):            movabs  rbx, 125783943755728
0x72665b91d015(+21):            call    rbx
0x72665b91d017(+23):            movabs  rdi, 125783948509197
0x72665b91d021(+33):            call    rbx
0x72665b91d023(+35):            movabs  rdi, 125783948509210
0x72665b91d02d(+45):            call    rbx
0x72665b91d02f(+47):            movabs  rdi, 125783948509223
0x72665b91d039(+57):            call    rbx
0x72665b91d03b(+59):            movabs  rdi, 125783948509236
0x72665b91d045(+69):            call    rbx
0x72665b91d047(+71):            movabs  rdi, 125783948509249
0x72665b91d051(+81):            call    rbx
0x72665b91d053(+83):            movabs  rdi, 125783948509262
0x72665b91d05d(+93):            call    rbx
0x72665b91d05f(+95):            movabs  rdi, 125783948509275
0x72665b91d069(+105):           call    rbx
0x72665b91d06b(+107):           movabs  rdi, 125783948509288
0x72665b91d075(+117):           call    rbx
0x72665b91d077(+119):           movabs  rdi, 125783948509301
0x72665b91d081(+129):           call    rbx
0x72665b91d083(+131):           mov     eax, 4
0x72665b91d088(+136):           pop     rbx
0x72665b91d089(+137):           ret

terra-Linux-x86_64-094c5ad (1.2.0)

❯ ~/works/3rd/terra-Linux-x86_64-cc543db/bin/terra ./tests/ainline.t
definition      {} -> int32
define dso_local i32 @"$bar"() {
entry:
  tail call void @"$foo"()
  ret i32 4
}
assembly for function at address 0x7781c8e9f000
0x7781c8e9f000(+0):             push    rax
0x7781c8e9f001(+1):             movabs  rax, 131399305261088
0x7781c8e9f00b(+11):            call    rax
0x7781c8e9f00d(+13):            mov     eax, 4
0x7781c8e9f012(+18):            pop     rcx
0x7781c8e9f013(+19):            ret

I noticed this from the disassembly on one of my personal projects, but it seems like inlining doesn't work and I checked against the existing test and that seems to be the case.

@norcalli
Copy link
Author

If I find the time, I can try to bisect it but maybe it's just a result of the LLVM change

@elliottslaughter
Copy link
Member

Yeah, I was wondering if this was going to come bite us.

LLVM 17 completely removed the old optimization pipeline. Therefore, I had to make a hard switch over with that LLVM version to the new optimization pipeline. In the process, I had to remove the manual inliner. Maybe it's possible to adapt, but to be honest the new optimization pipeline is pretty undocumented and even the stuff I've tried to do so far has been pretty inscrutable.

There is some good news: this impacts JIT mode only. If you use terralib.saveobj you'll see the function get inlined as expected.

Add the following to the bottom of tests/ainline.t:

print(terralib.saveobj(nil, "llvmir", {bar=bar}))

Then you'll see it print out:

Output of running Terra on LLVM 18 with AOT mode
; ModuleID = 'terra'
source_filename = "terra"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin23.5.0"

@str.9 = private unnamed_addr constant [13 x i8] c"hello, world\00", align 1

; Function Attrs: nofree nounwind
define dso_local noundef i32 @bar() local_unnamed_addr #0 {
entry:
  %puts.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  %puts1.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  %puts2.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  %puts3.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  %puts4.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  %puts5.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  %puts6.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  %puts7.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  %puts8.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  %puts9.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9)
  ret i32 4
}

; Function Attrs: nofree nounwind
declare noundef i32 @puts(ptr nocapture noundef readonly) local_unnamed_addr #0

attributes #0 = { nofree nounwind }

So then there are two possible workarounds:

  • Use LLVM 16 or older
  • Use Terra with terralib.saveobj to generate either .o files or binaries/libraries directly

@norcalli
Copy link
Author

Ah I see, that makes sense. I think I can live with that workaround, although probably prudent to print out a warning or something for setinlined until the JIT pipeline supports it.

Tangentially but also related, I was wondering if there was an easy way to load/link against .o files directly without first turning them into an archive or library directly from terra. Seems like it might be more relevant now with saveobj producing different code.

Although for this case, I can produce llvm bitcode and link that back in, which works (but has the disadvantage of linking into the global namespace?)
ala

local exports = {main=main}
local O = terralib.linkllvmstring(terralib.saveobj(nil, "bitcode", exports))
for k, v in pairs(exports) do
  print(O:extern(k, v.type))
end

@elliottslaughter
Copy link
Member

I have produced a set of Linux binaries for 1.2.0 with LLVM 16, and attached them to the release with a note about this as a known issue. It looks ok on my end, but please check to see if this works for you:

https://github.com/terralang/terra/releases/download/release-1.2.0/terra-Linux-x86_64-cc543db-llvm16.tar.xz

The call terralib.linklibrary ultimately decomposes into llvm::sys::DynamicLibrary::LoadLibraryPermanently. Based on the documentation I would guess that it only works on shared objects, but you're welcome to test it out:

if (sys::DynamicLibrary::LoadLibraryPermanently(luaL_checkstring(L, 1), &Err)) {

https://llvm.org/doxygen/classllvm_1_1sys_1_1DynamicLibrary.html#a53d32d3b3baefdec31d3d94b0586d437

@elliottslaughter elliottslaughter changed the title setinlined doesn't seem to work after upgrade from terra 1.1.1 to 1.2.0 setinlined not working in JIT mode on LLVM >= 17 Aug 6, 2024
@norcalli
Copy link
Author

norcalli commented Aug 7, 2024

yup it looks like it works! thank you! I did briefly look into the pass system myself to see if I could figure it out but I gave up haha. llvm docs are as inscrutable as always. I tried reading the Rust crate docs too just to see if I could make a JIT engine with inlining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants