-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33592: [C++] support casting nullable fields to non-nullable if there are no null values #43782
base: main
Are you sure you want to change the base?
GH-33592: [C++] support casting nullable fields to non-nullable if there are no null values #43782
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
|
52100dd
to
49d5a99
Compare
This PR looks pretty good to me, what additional help would you like getting it merged? |
Thanks for the help! I'm a random contributor, so I'm not sure about the exact workflow, but I think both getting the CI run approved and having a C++ owner approve it are needed. Any more detailed thoughts on if the tests are adequate, if there's anywhere else in the code base that you think needs to change, my handling of the go implementation by just deleting the shortcircuit, or any other more detailed thoughts? Basically anything that would reduce the mental load on the code owner I think would increase the odds they approve it :) |
You can mark as ready for review? Or this is still wip? |
Oops, didn't mean for this to still be marked WIP. Looks like a few failures that need to get fixed, but still a high-level review of the general approach would still be appreciated in the meantime. |
You probably need the same corresponding check on the Go side to verify that it's only allowed if there are no nulls. Also, the Go implementation has been moved to the apache/arrow-go repository. So please file the PR there for the Go side. Sorry for the confusion, we just haven't removed the Go code from here yet. |
Thanks @zeroshade , will do. Do we need both PRs to land at about the same time, or can I do that independently? I will undo my changes to the go code here, leaving it untouched. |
They can land independently, no issues there. Thanks! |
49d5a99
to
eb9e7b6
Compare
Just pushed a new version:
We will see if this passes CI now... |
eb9e7b6
to
e6d54d3
Compare
Pushed a new version, hopefully that fixed the broken tests:
|
@zeroshade I think this is ready to review/merge, the failing CI runs look like flakes when trying to setup the environment? |
Can we move this to apache/arrow-go? |
Ah, the Go part was removed from this PR. |
e6d54d3
to
106e627
Compare
fdb99d1
to
1b2c940
Compare
The current CI failures look to be caused by #44985 (or similar), not by this PR. I rebased on main, everything looks pretty much the same as before. @zeroshade can you give this a review when you get the chance, or let me know how I can make it easier for you to do so? @rustyconover if you want to take a look again, maybe that will help. Thank you both! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small question about null count checking.
if (field_index == kFillNullSentinel) { | ||
ARROW_ASSIGN_OR_RAISE(auto nulls, | ||
MakeArrayOfNull(target_type->GetSharedPtr(), batch.length)); | ||
out_array->child_data.push_back(nulls->data()); | ||
} else { | ||
const auto& values = (in_array.child_data[field_index].ToArrayData()->Slice( | ||
in_array.offset, in_array.length)); | ||
if (!target_field->nullable() && values->null_count > 0) { | ||
return Status::TypeError("field '", target_field->name(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, should we discard the buffer[0]
if new data doesn't contains any nulls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mapleFU can you elaborate a little more? I don't think I understand neither why we should do this, or how. Does buffer[0] contain the null bitmap, and if there are no nulls then we can get away without storing one??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore me, i previously think we can
auto values = (in_array.child_data[field_index].ToArrayData()->Slice(
in_array.offset, in_array.length));
if (out field not nullable but in field nullable) {
if (values->GetNullCount() > 0) {
return error
}
values->buffers[0].reset();
}
But I also think this doesn't matter here
f159275
to
40ae139
Compare
40ae139
to
ed9b368
Compare
ed9b368
to
56c0d88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General LGTM!
56c0d88
to
9016a83
Compare
whoops, I accidentally force-pushed with no changes, which closed the PR. Now it's back up |
b124c64
to
ebe9a5a
Compare
@mapleFU as I added the slicing tests, I also refactored the tests to make them much more consistent and concise. Take another look and make sure that your "LGTM" still holds after those changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit!
return Status::Invalid("field '", in_field->name(), "' of type ", | ||
in_field_type->ToString(), | ||
" has nulls. Can't cast to field '", out_field->name(), | ||
"' of non-nullable type ", out_field_type->ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the field that's non-nullable
return Status::Invalid("field '", in_field->name(), "' of type ", | |
in_field_type->ToString(), | |
" has nulls. Can't cast to field '", out_field->name(), | |
"' of non-nullable type ", out_field_type->ToString()); | |
return Status::Invalid("field '", in_field->name(), "' of type ", | |
in_field_type->ToString(), | |
" has nulls. Can't cast to non-nullable field '", | |
out_field->name(), "' of type ", | |
out_field_type->ToString()); |
Notes for myself/fixer:
cmake .. --preset ninja-debug-basic
, thencmake --build . && PYTHON=python ctest -R 'arrow-compute-scalar-cast-test' --output-on-failure
to run the specific testuvx pre-commit run --all-files clang-format