-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Pair
iterator to create tables with keys and to implement groupby/groupreduce
#139
Comments
Yeeeesss! I was just going to suggest this a couple of days ago (again, actually, see queryverse/IterableTables.jl#26, which I believe is the same idea?). |
Although, I think the iterators shouldn't return pairs, I think they should just continue to return named tuples, I think otherwise the syntax for most operations would get really weird. So I'm a huge fan of having |
Yes, that's the same as your issue from last year :) I also agree that one should iterate convert(NextTable, ColumnsPair)
convert(NDSparse, ColumnsPair) that do not copy the data. |
Yes, that is a cool idea. Another option once 0.7 is out would be to have a custom row type that implements |
I think it helps to go back to the purpose of Columns: to provide Columnar storage for array of structs. So I think we should use
this would become
Sounds fine. However I agree that iteration should avoid
Sounds good. Thanks for opening this great issue! |
Good point, |
I wonder what should EDIT: ups, I've just seen that you're recommending the same above... We should probably go for the |
Good question. Yeah it seems |
The only annoyance is that for some reason there is no method UPDATE: after some experimenting it seems that storing as a pair of |
Can be closed now right? |
I think it's still missing the |
Superseeded by #151 |
I've thought more about how to implement all the various IndexedTables operations in an "iterator" way, and it seems to me that
collect_columns
is not quite sufficient as most operations will be iterating both keys and values. Even formap
it would be nice to have a way to denote that some columns of the output will be primary.Here is my proposal:
Special case
collect_columns
to the case where theeltype
isPair{T1, T2} where {T1<:Tup, T2}
, which would be collected as aPair
ofColumns
(or aPair
ofColumns
andArray
ifT2
is not aTup
. This is not too hard to do, one just needs to overloadBase.setindex!
,Base.push!
,Base.eltype
forPair{<:Columns, <:Any}
(which I hope is not type piracy as we ownColumns
)If necessary, add some methods
convert(::NextTable, ::Pair{<:Columns, <:Any})
andconvert(::NDSparse, ::Pair{<:Columns, <:Any})
Reimplement
groupby
,groupreduce
andjoin
using iterators ofPairs
ofTup
andcollect_columns
In
map
accept functions that returnPair
ofTup
, in which case collect as table with primary keysIn the
table(iterator)
constructor, accept iterators that iteratePair
ofTup
, in which case collect as table with primary keysThe text was updated successfully, but these errors were encountered: