Rust InterOp Architecture Decision: The role of delta-kernel
and/or delta-rs
in delta-dotnet
#79
Replies: 2 comments 2 replies
-
Kernel's ffi types are the future. I haven't spent as much time as I should have investigating the kernel code because write support is a current limiter. I've helped with some python adapters, but that's it. |
Beta Was this translation helpful? Give feedback.
-
Thanks for typing this up so neatly. I think Delta Kernel will support all the table features. I'm not sure if Delta Kernel will ever help out with an operation like Z Ordering: dt = DeltaTable("tmp")
dt.optimize.z_order([country]) So perhaps Delta Kernel will be useful for all the table features, but delta-rs will still be useful for Z Ordering-type operations. Or perhaps we can just use Delta Kernel and then implement Z Ordering natively in C#. |
Beta Was this translation helpful? Give feedback.
-
Hey @mightyshazam,
Jotting down some updates, hoping to get your thoughts on some options + questions below.
✏️Updates
I spent a lazy Friday reading through the
Bridge
implementation. First of all, really great work, I especially love the DataFusion support we inherit fromdelta-rs
, running SQL on top of Delta via C# is something I never thought I'd see 🙂.The Tokio runtime passing the Cancellation Token from C# to Rust is super smooth, I love how everything is async through and through, and the unsafe code is neatly tucked away in the
Bridge
.It's obvious you've put a lot of effort and thoughtfulness into this project, I really appreciate it, it's fantastic for the Dotnet community.
As @MrPowers mentioned earlier, before I knew about
delta-dotnet
- I've been dabbling in the work thedelta-kernel
folks have done, and used a very similar approach as you to convert the FFI into C# via ClangSharp. I spent a weekend throwing some working C# code together against thedelta-kernel
FFI where you can convert Delta into a Arrow Table - but it's not nearly close to the amount of thoughtfulness or effort you've put into theBridge
,Runtime
etc - and the underlying Rust code.My observation is, today -
delta-rs
is a lot more feature-rich, compared todelta-kernel
at this point (e.g. SQL support via DataFusion). I also understand that delta-rs has slowly started converting delta-kernel too.Given that
delta-dotnet
already works just fine for read/write, I'm wondering what your vision is for the project.There's 2 high level options I'm seeing.
Option 1:
delta-dotnet
keeps dep ondelta-rs
, but no dep ondelta-kernel
Keep the existing dep on
delta-rs
for foreseeable future.As
delta-rs
adoptsdelta-kernel
, the Bridge evolves it's FFI in lockstep withdelta-rs
and enjoys newdelta-kernel
features anyway.Use the saved effort to build more handy features on top of the Bridge that's C# native, e.g. expose a DataFrame C# API (works against Arrow) - via
DataFrame.FromArrowRecordBatch
.✅Pros:
delta-dotnet
, enjoy asdelta-rs
folks do most of the hard work 🙂delta-dotnet
implementation is more or less feature complete for most use cases, just keep in cruise control, new features, etc.delta-rs
- being Rust - is performant as is viaP/Invoke
and hopefully supports multiple writers via partitioning (presumably, I haven't benchmarked yet, didn't see adelta-dotnet
unit test.)TOML
file you enable that feature, so hopefully there's a method theBridge
exposes that takes in params to connect to each cloud storages)❌ Cons:
delta-dotnet
architecture tied todelta-rs
for good, rather than being tied todelta-kernel
(more flexible).Option 2: interface out
delta-dotnet
's dep ondelta-rs
, and slowly migrate todelta-kernel
as it maturesStep 0 - add
Interfaces
to everything underTable/*.cs
, and slowly migrate todelta-kernel
, starting with Arrow-based read support which already works. Mark unsupported methods withNotSupportedException
or something similar, until Kernel supports it.Step 1 - When
delta-kernel
adds write-support, adopt that.Step 2 - Phase out dependency on
delta-rs
? Of course, we'd need the same SQL capabilities DataFusion adds to not break users. maybe the Kernel could add support for something similar via scan.✅Pros:
delta-kernel
❌ Cons:
delta-rs
Parquet -> Arrow reader is rapid already.delta-kernel
support, csproj wrappers, build pipelines, maintenance etc.DataFusion
AKA SQL support.Would love to hear your thoughts on which Option you prefer, and if there's a 3rd+ logical Option.
I'm personally happy to support implementing both options - specially Option 2, since I have a bunch of context on the Kernel from digging into it - but, only if there's a clear-value add in this new architecture.
Option 1 doesn't really need "new architecture work" (you've done it already) - but we could focus on pushing the limits of
delta-dotnet
and enabling new use cases, like beefing up the examples folder for Azure Blob etc, usingTokenCredential
, patterns forN
parallel threads writing to same table againstN
partitions, samples on how to deserialize Event Hubs/Kafka partitions (basically KDI in C#), ensure things work in a container without weird problems, adding aDataFrame
API to extendTable
, and so on.Beta Was this translation helpful? Give feedback.
All reactions