-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Delta Kernel FFI support for read path and expose Apache.Arrow.Table and Microsoft.Data.Analysis.DataFrame read interface #89
Conversation
Using cargo to recursively generate FFI headers from the repo was near impossible for my Rust skill level 😄
…port context objects
src/DeltaLake/Kernel/Arrow/Extensions/ArrowContextExtensions.cs
Outdated
Show resolved
Hide resolved
The methods don't have to be synchronous. Technically, C is always synchronous. However, any method that takes a callback function can be wrapped with a |
Makes sense @mightyshazam - I had the incorrect understanding that this works through and through due to the Tokio wrapper. But I'm guessing the idea is the Cancellation Token would be respected by that Also, do you prefer I get this TSC wrapper done before merging this PR, so we change the Or should we have the synchronous methods as the PR currently has, and expose additional async methods? |
We can change the methods to async later. As long as we don't increment the version of the package, it won't push a new one. |
@mightyshazam - yeap good point, sounds good, I added this issue to track this: Will get it done in a separate smaller PR |
Why this change is needed
This PR adds Delta Kernel FFI based read support.
Closes issue: #82
How
Delta Kernel integration
Adds
delta-kernel-rs
as a pinned submodule. Uses the same structure asBridge
to generate the FFI + Rust BinaryInterfaces the user facing entrypoints and sets up some simple model relationships
DeltaEngine
asIEngine
andDeltaTable
asITable
Kernel
to overrideBridge
and fall back when implementation is missingBridge
Runtime
/Table
as base class forKernel
, overrides with the subset of read methodsKernel
exposesImplements Kernel FFI InterOp - most of which is Pointer management
Adds an
Arrow.Table
andDataFrame
method to expose the Kernel scannedRecordBatches**
as a queryableAdds write concurrency and read concurrency tests - to find any memory management problems and test resiliency etc.
Misc
bootstrap-dev-env.sh
idempotent script to quickly get a dev env up and running that can run Unit Tests + Example project (e.g. using a throwaway WSL box)Test
Add a unit test that tests read/write concurrency - 27 concurrent writers, 50 readers.
Stress tested the new read/write concurrency unit test across 2800+ loops on Windows + Linux overnight:
Windows
Linux
Ran cloud write example project (Azure Storage Account)
Tested Nuget Package pipeline and all targes with
cross
- sample run