Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial IFRT Julia bindings #738

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

mofeing
Copy link
Collaborator

@mofeing mofeing commented Feb 13, 2025

using Reactant
using Reactant.XLA: IFRT

a = ConcreteRArray(rand(2,2))
code = @code_hlo sin.(a)

@show a sin.(a)

client = IFRT.Client(Reactant.XLA.default_backend[])
exec = IFRT.compile(client, code)

a_ifrt = IFRT.Array(client, a.data[1].buffer)
results = IFRT.execute(exec, (a_ifrt.ptr,), (UInt8(false),), Val(1))

b = zeros(size(a))
IFRT.CopyArrayToHostBuffer(results[1], pointer(b))
@show b

returns

AssertionError("Could not find registered platform with name: \"cuda\". Available platform names are: ")
a = ConcreteRArray{Float64, 2, 1, Reactant.Sharding.FinalizedNoSharding}([0.5407721438548094 0.542745339651497; 0.5957566714113256 0.9478441500987932])
sin.(a) = ConcreteRArray{Float64, 2, 1, Reactant.Sharding.FinalizedNoSharding}([0.5147981128093282 0.5164887528557017; 0.5611352302722601 0.8121595940794706])
b = [0.5147981128093282 0.5164887528557017; 0.5611352302722601 0.8121595940794706]

end
end

mutable struct Buffer
buffer::Ptr{Cvoid}
holded::Ptr{Cvoid}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just make a second function HeldBuffer which just contains the heldbuffer pointer, and then for IFRT we shouldn't need to use the old buffer class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm okay, but we need this HeldBuffer inside Buffer to avoid it being destructed when IFRT is using it and Buffer is GCed.

how about this?

Suggested change
holded::Ptr{Cvoid}
holded::Union{Nothing,HeldBuffer}

@avik-pal avik-pal force-pushed the ss/ifrt-julia-bindings branch from ed714ed to 9a8f9a7 Compare February 13, 2025 15:28
mutable struct Buffer
buffer::Ptr{Cvoid}
held::Union{Nothing,HeldBuffer}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

In essence I feel like we should consider buffer and heldbuffer as two totally different API's. Of course under the hood technically one can be converted to the other, but we shouldn't merge them unless needed. This will make it easier to transition the rest of the code from PJRT -> IFRT (and also the inner buffers having a union will make them slow)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I agree with your view that it would be cleaner and more performant. but I'm worried about destruction of the objects and having "use after free" bugs. In this transition period, we will have both Buffer and HeldBuffer.

if Buffer takes ownership of it, then the underlying xla::PjRtBuffer will be freed when Buffer is GCed. if we have a HeldBuffer around, it will be broken because the ptr to which the shared_ptr tries to point will be already freed.

consider the opposite: HeldBuffer takes ownership of it. the same problem applies: HeldBuffer can be GCed before Buffer and Buffer will be broken because the pointer will already be freed if you try to use it again.

here is the dependency graph: both Buffer and HeldBuffer have references to the same xla::PjRtBuffer object.

flowchart TB
    subgraph Julia
    Buffer
    HeldBuffer
    end
    Buffer --> PjRtBuffer
    HeldBuffer --> shared_ptr
    shared_ptr --> PjRtBuffer
Loading

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO in order to avoid these issues, the first implementation was best:

  • it avoids double-free and use-after-free issues
  • both HeldBuffer and Buffer are separate types which makes incremental transition easier
  • no overheads

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should never have a heldbuffer which points to the same data as a regular buffer. Each should be considered an exclusive owner of its underyling data.

And analagously we should avoid making a held buffer out of an existing pjrt buffer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that way there would never be any use after free issues

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that means that we would be duplicating data, because PjRtBuffer holds raw data, which will be costly

mmmm or we will need to replicate stuff for HeldBuffer. lemme try this weekend

@@ -1,12 +1,13 @@
mutable struct Client
client::Ptr{Cvoid}
global_ordinals::Vector{Cint}
holded::Ptr{Cvoid}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here above having a separation

ideally we have
abstract struct Client

struct PJRTClient <: Client
   ...
end
struct IFRTClient <: Client
   ...
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants