-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow c data interface #79
Arrow c data interface #79
Conversation
92bf9d2
to
83312b4
Compare
c17a20d
to
65eacf4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Just some notes from writing a few things along these lines in nanoarrow 🙂
include/sparrow/c_interface.hpp
Outdated
*/ | ||
template <template <typename> class Allocator> | ||
requires sparrow::allocator<Allocator<char>> | ||
std::unique_ptr<ArrowSchema> make_arrow_schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would highly recommend avoiding std::unique_ptr<ArrowSchema/ArrowArray>
, unless used with a deleter that ensures that any unreleased array that it is pointing to is released when the unique pointer is deleted. In cudf they use a unique pointer with a deleter:
...in nanoarrow we have a UniqueArray/Schema
class that wraps a stack-allocated ArrowArray/Schema
:
..but the idea is the same: always make a home for the struct such that it is guaranteed to be cleaned up, and "move" it between safe homes (e.g., using something like https://github.com/apache/arrow-nanoarrow/blob/a0632177c27cb52880c62533fcd441613f41815b/src/nanoarrow/nanoarrow_types.h#L334-L340 ).
(Apologies if there's some C++ magic going on here that already ensures this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in replying.
Thank you for your comments. The PR was improved and use a deleter 👍
include/sparrow/c_interface.hpp
Outdated
child->release(child); | ||
SPARROW_ASSERT_TRUE(child->release == nullptr); | ||
} | ||
delete child; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the implementations of an ArrowArray
/ArrowSchema
I've read choose to own their own memory for the ArrowArray
/ArrowSchema
structs. In this case, it might mean an std::move()
of the std::vector<std::unique_ptr<ArrowArray>>
into the private data and rely on the C++ deleter to call the release callbacks (see my comment below about always making sure that std::unique_ptr<ArrowThing>
is guaranteed to call the release callback if present).
Alternatively, one could use new ArrowArray[]
and something like https://github.com/apache/arrow-nanoarrow/blob/a0632177c27cb52880c62533fcd441613f41815b/src/nanoarrow/nanoarrow_types.h#L334-L340 to "move" the struct from foreign memory.
a131967
to
19bf006
Compare
4a8c23a
to
8259315
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to split the declaration and implementation of public API functions? Ideally all the public API functions should be at the top of the file, so one can quicky check the API without scroling the whole file.
586ace2
to
7b014ae
Compare
|
||
extern "C" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it may still be needed to include:
#ifndef ARROW_C_DATA_INTERFACE
#define ARROW_C_DATA_INTERFACE
...to prevent a duplicate definition in the case where somebody does:
#include <adbc.h>
#include <sparrow/c_interface.hpp>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, I fixed that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have noticed this on the first round, but if one includes adbc.h
first, the helpful initializers you've added to the definition won't be there (I am not sure if this affects the code you've written here but might have unexpected consequences for users of this header).
include/sparrow/c_interface.hpp
Outdated
int64_t null_count, | ||
int64_t offset, | ||
const R& buffer_sizes, | ||
std::vector<std::unique_ptr<ArrowArray>>&& children, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be std::vector<std::unique_ptr<ArrowArray, arrow_array_custom_deleter>>
? (It may help reduce the number of lines of code where an exception or early return results in leaked ArrowArray
s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Fixed
include/sparrow/c_interface.hpp
Outdated
std::string_view name, | ||
std::optional<std::span<char>> metadata, | ||
std::optional<ArrowFlag> flags, | ||
std::vector<std::unique_ptr<ArrowSchema>>&& children, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be std::vector<std::unique_ptr<ArrowSchema, arrow_schema_custom_deleter>>
? (It may help reduce the number of lines of code where an exception or early return results in leaked ArrowSchema
s)
…ly always use custom deleter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Fix #69