-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add append!(::StructVector, iterator::Any) using Tables.isrowtable #117
Conversation
Codecov Report
@@ Coverage Diff @@
## master #117 +/- ##
==========================================
+ Coverage 93.87% 94.54% +0.66%
==========================================
Files 10 10
Lines 392 403 +11
==========================================
+ Hits 368 381 +13
+ Misses 24 22 -2
Continue to review full report at Codecov.
|
Thanks! Yes, the API looks good to me, and this is actually very useful, for example to improve |
src/tables.jl
Outdated
# Input `rows` is a container of rows _and_ satisfies column | ||
# table interface. Thus, we can add the input column-by-column. | ||
table = Tables.columns(rows) | ||
nt = foldl(Tables.columnnames(table); init = NamedTuple()) do nt, name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that this could be a bit heavy on the compiler. Maybe, when iterating pairs name => Tables.getcolumn(table, name)
, one could just do append!(getproperty(s, name), Tables.getcolumn(table, name))
, without having to generate all the intermediate named tuples.
It'd be also good to check that this works for StructArray{<:Tuple}
, I'm not completely sure how Tables.jl works in the case of integer names (rather than symbol names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
integer column names should be fully supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test like this:
StructArrays.jl/test/runtests.jl
Lines 346 to 348 in 8f1ce3f
# Testing integer column "names": | |
@test invoke(append!, Tuple{StructVector,Any}, StructArray(([0],)), StructArray(([1],))) == | |
StructArray(([0, 1],)) |
I used invoke
as I don't know if there is a pre-defined tables with integer column names in Tables.jl. I tried Tables.table(([1, 2], [3, 4]))
and Tables.table([1 2; 3 4])
but they didn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t = Tables.table([1 2; 3 4])
didn't work in the sense that propertynames(t)
is [:Column1, :Column2]
rather than [1, 2]
(even though getproperty(t, 1)
does work).
Since Tables.getcolumn
does not have something equivalent to get(container, key, default)
, there is no exception-safe way to append tables column-by-column, other than checking Tables.columnnames
.
@piever Thanks for having look at it. I added some tests and tried to reflect your comments. |
table = Tables.columns(rows) | ||
isempty(_setdiff(propertynames(s), Tables.columnnames(rows))) || | ||
_invalid_columns_error(s, rows) | ||
_foreach(propertynames(s)) do name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally for this StructArrays uses foreachfield
, but that assumes that also rows
implements getproperty
for column access, which in the latest Tables doesn't need to be the case. What I don't understand is: how is foldl
better than foreach
for tuples? Is it as efficient as the generated function approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the @code_llvm
or @code_native
for mapfoldl
on tuples, it seems to generate the optimal thing, I'm just a bit confused as to where is the special method for tuples. Nevermind, I now see that it dispatches to Base.afoldl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for merging this. Maybe you've already figured it out why, but the reason why I didn't use foreachfield
there was that AFAICT foreachfield
does not pass property names to the function.
This PR implements:
append!
so that any "row iterator" can be appended to a struct array.Tables.isrowtable
.This is the same API implemented in JuliaData/TypedTables.jl#57. Since this is a PR on top of #116, I'm posting it as a draft.
I'll add some tests if adding this API is OK.